Data Loaders

Nodes to crawl and process external information into structured data

Data Loaders fetch and convert data from different sources into forms that can be loaded into a Vector database (Semantic Search node). The output of these loaders is often connected to a Semantic Search node (specifically, the "documents" edge).

The data loader / semantic search query is common in LLM architecture because LLMs have limited context windows (e.g., you can't place all the files in your organization into an LLM at a time). By placing data into a vector database, you allow for queries and the return of the most relevant pieces that can then be reasoned with using an LLM.

Note: data loaders load data into semantic search nodes when the pipeline is run. To have a persistent knowledge base, use the "Knowledge base reader" node which creates a permanent vector database that you can query over.

See below for a common architecture of using a data loader (URL loader) with Semantic Search.

File Loader

The File node allows you to select a file that has been uploaded under the "Storage" tab >> "File" sub-tab of the platform.

Additionally, you may upload files by clicking on "Upload File" on the file node. Uploaded files will appear in your storage tab.

File Loader Usage

Since files are stored in their raw format, you can check off the "Process Files into Text" box to process the contents of the files into text (a format that can be accepted by the Semantic Search node. Common nodes that you connect the file node to include the LLM and semantic search node, which accept data of type text not file.

CSV Query

CSV Query nodes provide answers to queries (through to the "query" edge) using a CSV file (the "CSV" edge) and can be used without an LLM node.

URL Loader

URL Loader nodes crawl the webpage of an associated input URL and return documents.

A URL can take in an Input node result or a Text node result as input (through the "URL" edge), and associated webpages can be blogs, docs, or any website that contains text and no auth required.

Wikipedia Loader

Receive a keyword (through the "query" edge) either from user input (input block) or text block. Returns relevant information from Wikipedia.

Youtube Loader

Receive a YouTube URL (through the "URL" edge) either from user input (input block) or a URL placed in a text block.

ArXiv Loader

The ArXiv Data Loader node takes in keywords (text) from either an Input node or a Text node (through the "query" edge) and returns relevant ArXiv papers/excerpts as documents.

There are several different internet search providers available under the DataLoaders tab.

Different internet search tools focus on different areas of search and can provide specialized results. We provide the following options to support your search needs.

  • You.com : General search leveraging the you.com search engine

  • You.com News : Real time news updates powered by you.com

  • Exa.ai : General search powered by Exa AI

  • Exa.ai Companies: search for company data provided by Exa AI

  • Exa.ai Research : search over research papers

To get the best results from your searches choose the appropriate node for the type of data you want. Also, reference the provider's documentation for how to format your search queries.

See the you.com API documentation

See the Exa AI prompting guide

Here is an example pipeline using the Exa AI companies index to research companies.

Note: often, the search results are passed to a Semantic search database to allow for queries over the database. Additionally, another common architecture is using an LLM to improve the user query before passing it into the query edge of one of these internet search nodes.

API Loader

The API node is your best option if you want to make a call to a third-party API.

We currently offer the following options:

  • Method: get, post, put, delete, patch

  • URL: the base URL that the request will be sent to

  • Headers: these can be passed in as key-value pairs

  • Params: either body or query params also as key-value pairs

By default, the node will include 1 pair of empty key-value pairs for each of the headers and body of the request. You have the option to either hard-code these values in the node itself or to add additional inputs to the node by including variables in the form {{variable}} in any of the fields. In the event, that your request does not need any headers and/or body parameters, you can always click on the delete icon on the right side of the node to delete them.

The output of the node will be JSON containing the response to your request. We recommend passing this JSON as context to an LLM and continuing with your pipeline.

Last updated