Data Loaders

Data Loaders fetch and convert data from different sources into forms that can be loaded into a Vector database (Semantic Search node). The output of these loaders is often connected to a Semantic Search node (specifically, the "documents" edge).

The data loader / semantic search query is common in LLM architecture because LLMs have limited context windows (e.g., you can't place all the files in you organization into an LLM at a time). By placing data into a vector database, you allow for queries and the return of the most relevant pieces that can then be reasoned with using a LLM.

Note: data loaders load data into semantic search nodes when the pipeline is run. To have a persistent knowledge base, use the "Knowledge base reader" node which creates a permanent vector database that you can query over.

See below for a common architecture of using a data loader (URL loader) with Semantic Search.

File Loader

The File node allows you to select a file that has been uploaded under the "Storage" tab >> "File" sub-tab of the platform.

Additionally, you may upload files by clicking on "Upload File" on the file node. Uploaded files will appear in your storage tab.

File Loader Usage

Since files are stored in their raw format, you can check off the "Process Files into Text" box to process the contents of the files into text (a format that can be accepted by the Semantic Search node. Common nodes that you connect the file node to include the LLM and semantic search node, which accept data of type text not file.

CSV Query

CSV Query nodes provide answers to queries (through to "query" edge) using a CSV file (through the "csv" edge) and can be used without an LLM node.

URL Loader

URL Loader nodes crawl the webpage of an associated input URL and returns documents.

A URL can take in an Input node result or a Text node result as input (through the "url" edge), and associated webpages can be blogs, docs, or any website that contains text and no auth required.

Wikipedia Loader

Receive a keyword (through the "query" edge) either from user input (input block) or text block. Returns relevant information from Wikipedia.

Youtube Loader

Receive a youtube URL (through the "URL" edge) in the either from user input (input block) or a URL placed in a text block.

ArXiv Loader

The ArXiv Data Loader node takes in keywords (text) from either an Input node or a Text node (through the "query" edge) and returns relevant ArXiv papers/excerpts as documents.

There are several different internet search providers available under the DataLoaders tab.

Different internet search tools focus on different areas of search and can provide specialized results. We provide the following options to support your search needs.

  • You.com : General search leveraging the you.com search engine

  • You.com News : Real time news updates powered by you.com

  • Exa.ai : General search powered by Exa AI

  • Exa.ai Companies: search for company data provided by Exa AI

  • Exa.ai Research : search over research papers

To get the best results from your searches choose appropriate node for the type of data you want. Also reference the providers documentation for how to format your search queries.

See the you.com api documentation

See the Exa AI prompting guide

Here is an example pipeline using the Exa AI companies index to research companies.

Note: often times, the search results are passed to a Semantic search database to allow for queries over the database. Additionally, another common architecture is using an LLM to improve the user query before passing it into the query edge of one of these internet search nodes.

Last updated