Vector DB

Vector Query Node
Vector Databases are a type of database often used in conjunction with LLMs. Vector databases store data from a source (e.g., file, website, youtube video), and allow queries to return the most relevant data from the source. Vector Databases see the blog post for a detailed explanation of vector database.

Using Vector Databases in Your Pipeline

There are two ways to query data in vector database from a pipeline, vector queries and vector stores. a Vector Query node loads documents into a vector store and allows querying to find the most similar documents. A Vector Store Reader node queries vectors that have already been stored permanently in a Vector Store.
  • Use a Vector Query node when your pipeline loads new data and has to query it
  • Use a Vector Store Reader node when you want to query existing data that has already been loaded into a Vector Store.

Vector Query

The Vector Query node has two input edges: "query" and "documents" and one output edge "result". The Vector Query node accepts documents as input, usually loaded through a dataloader (e.g., file loader, URL loader, youtube video loader) and stores it in a temporary vector database.
A VectorQuery node
Example pipeline loading and query document using a Vector Query Node

Vector Store Reader

The Vector Store Reader node has one input edge "query" and one output edge "result". The Vector Store reader links to an existing Vector Store that you have defined (use the pull down menu within the node to find existing vector stores). Only the Vector Query node the Vector Store reader only accepts a "query" input. The node returns the most relevant documents from the existing vector store.
See how to create a Vector Store
The Vector Store Reader node can be combined with a LLM to create a chatbot that can answer questions about documentation.

Query Options

You can change various query parameters to improve your search.
  • Max Chunks Per Query: This parameter controls how many chunks or documents are returned from the vector database
  • Enable Filter: Enables filtering documents retrieved from the database based on document metadata. See how to filter documents.
  • Rerank Documents: Performs an additional Reranking step to reorder the documents by relevance to the query.
    • Note: Reranking incurs a latency cost but may improve query results.
When creating a Vector Store you can enable the "hybrid search" option. Hybrid search allows you to control the tradeoff between dense (semantic) and lexical (keyword) search.
  • Increasing Alpha emphasizes semantic search
  • Decreasing Alpha emphasizes lexical matching, ie finding exact keywords in the documents

Metadata Filtering

Using document metadata for filtering documents can improve the relevance of the returned documents.
Checking the "Enable Filter" box allows you to input an additional filter query.
You can specify a filter query using query syntax similar to MongoDB. To filter a specific metadata field , specify the field name and the desired value. For example if we have a collection of books summaries with metadata stored in a vector database we can query for all books in the "mystery" genre"
{"genre": "mystery"}