Skip to main content

Sources

With LoyJoy, you have the ability to add your own data by incorporating external sources, which can then be utilized by the GPT model to generate answers to user questions. Adding external sources is a straightforward process - simply enter the URL of the desired source into the provided input field. The system will automatically recognize the file type associated with the URL.

gpt_sources_input_pdf gpt_sources_input_csv

Index Your Website

In addition to specific file types, you can also index entire websites by inputting their URLs. When you enter a URL, LoyJoy will first look for a sitemap associated with it. A sitemap is a file that indicates which pages belong to a website and should be crawled. If we find a sitemap, we will tell you how many pages will be crawled.

gpt_sources_sitemap

If no sitemap is found, you can still crawl the website. This is done by checking which links are contained on the homepage and examining which links are present on the linked pages and so on. This approach takes more time and is limited to at most 800 pages to be crawled.

gpt_sources_no_sitemap

info

If you know the URL to a sitemap, you can enter it directly. This can also be helpful if there are multiple sitemaps present on your website and you want to select a specific one.

Expert Settings

To further specify the beviour of your crawling, take a look into the export settings. To reduce the load for your webserver you can set a delay time in seconds that causes the crawler to take a pause after each request. Further, you may want to look for certain parts of your webpage that you may want to index by providing a set of CSS-selectors.

gpt_sources_expert_settings

These CSS-selectors are then used as a filter on the crawled pages. Only the content that is contained in the elements that match the CSS-selectors will be indexed. This can be useful if you want to exclude certain parts of your website from being indexed. For an example recipe site for example we might only want to include the text contained in the element with the ID recipe_detail_container as well as the elements with the class utils. To achieve this we can use the following CSS-selectors:

gpt_sources_css_selectors

In most browsers you can use right-click and "Inspect" to find the CSS-selectors of a specific element. Check out this guide for more information on how to use CSS-selectors.

Exclusions

When you add your website and its subpages for indexing, you may want to exclude certain pages from being included in the index. LoyJoy provides a convenient exclusion feature to achieve this. By clicking on the "exclude" option, you can create rules that exclude specific URLs based on a given string. These rules will prevent any URLs exactly containing the specified string from being included in the index. If you need to manage or modify the excluded paths, you can access the "Manage excluded paths" feature. This allows you to add additional rules or remove existing ones. Furthermore, you can customize the rule to exclude URLs that, for example, start with a specific string (so each subpaths will also be excluded).

gpt_sources_exclusions_rules