Data Loaders
Nodes to crawl and process external information into structured data
Last updated
Nodes to crawl and process external information into structured data
Last updated
Data Loaders fetch and convert data from different sources into forms that can be loaded into a Vector database (Semantic Search node). The output of these loaders is often connected to a Semantic Search node (specifically, the "documents" edge).
The data loader / semantic search query is common in LLM architecture because LLMs have limited context windows (e.g., you can't place all the files in your organization into an LLM at a time). By placing data into a vector database, you allow for queries and the return of the most relevant pieces that can then be reasoned with using an LLM.
Note: data loaders load data into semantic search nodes when the pipeline is run. To have a persistent knowledge base, use the "Knowledge base reader" node which creates a permanent vector database that you can query over.
See below for a common architecture of using a data loader (URL loader) with Semantic Search.
The File node allows you to select a file that has been uploaded under the "Storage" tab >> "File" sub-tab of the platform.
Additionally, you may upload files by clicking on "Upload File" on the file node. Uploaded files will appear in your storage tab.
Since files are stored in their raw format, you can check off the "Process Files into Text" box to process the contents of the files into text (a format that can be accepted by the Semantic Search node. Common nodes that you connect the file node to include the LLM and semantic search node, which accept data of type text not file.
CSV Query nodes provide answers to queries (through to the "query" edge) using a CSV file (the "CSV" edge) and can be used without an LLM node.
URL Loader nodes crawl the webpage of an associated input URL and return documents.
A URL can take in an Input node result or a Text node result as input (through the "URL" edge), and associated webpages can be blogs, docs, or any website that contains text and no auth required.
Receive a keyword (through the "query" edge) either from user input (input block) or text block. Returns relevant information from Wikipedia.
Receive a YouTube URL (through the "URL" edge) either from user input (input block) or a URL placed in a text block.
The ArXiv Data Loader node takes in keywords (text) from either an Input node or a Text node (through the "query" edge) and returns relevant ArXiv papers/excerpts as documents.
There are several different internet search providers available under the DataLoaders tab.
Different internet search tools focus on different areas of search and can provide specialized results. We provide the following options to support your search needs.
You.com : General search leveraging the you.com search engine
You.com News : Real time news updates powered by you.com
Exa.ai : General search powered by Exa AI
Exa.ai Companies: search for company data provided by Exa AI
Exa.ai Research : search over research papers
To get the best results from your searches choose the appropriate node for the type of data you want. Also, reference the provider's documentation for how to format your search queries.
See the you.com API documentation
See the Exa AI prompting guide
Here is an example pipeline using the Exa AI companies index to research companies.
Note: often, the search results are passed to a Semantic search database to allow for queries over the database. Additionally, another common architecture is using an LLM to improve the user query before passing it into the query edge of one of these internet search nodes.
The API node is your best option if you want to make a call to a third-party API.
We currently offer the following options:
Method: get, post, put, delete, patch
URL: the base URL that the request will be sent to
Headers: these can be passed in as key-value pairs
Params: either body or query params also as key-value pairs
By default, the node will include 1 pair of empty key-value pairs for each of the headers and body of the request. You have the option to either hard-code these values in the node itself or to add additional inputs to the node by including variables in the form {{variable}}
in any of the fields. In the event, that your request does not need any headers and/or body parameters, you can always click on the delete icon on the right side of the node to delete them.
The output of the node will be JSON containing the response to your request. We recommend passing this JSON as context to an LLM and continuing with your pipeline.