Knowledge Base

Create a permanent embedded database for frequently reused data.

Knowledge base allow users to convert and store data as vector embeddings, which represent the meaning of the stored data. This representation allows users to ensure that they are always using the most relevant data based on a query. You can access knowledge base in the "Knowledge base" sub-tab in the Storage tab.

You can store a wide range of data in a knowledge base, including files, scraped URLs, GitHub repositories, live-synced data from integrations (e.g., Google Drive, One Drive, Notion, Airtable) and more. Each data source is converted into a specified number of vectors (depending on its length).

Create a New Knowledge Base

To create a new Knowledge Base, take the following steps:

  1. Navigate to Storage tab and click "New".

  1. Within the pop up, fill in the Name and Description.

Under "Advanced Settings", you can also choose the chunk size, chunk overlap, hybrid search, and embedding model.

  1. Chunk size: when the data is loaded into the knowledge base, the database will chunk the data, or cut the data into pieces. Here, you control the size of each chunk. The chunk size is defaulted to 400 tokens or 1,600 characters (4 characters = 1 token). Decreasing the chunk size sometimes can be a remedy for returning more relevant information to an LLM (if a chunk is too large, the LLM can get confused with the large amount of data it has to reason with).

  2. Chunk overlap: the chunk overlap is the number of tokens overlap between chunks. This is defaulted to 0 tokens. Increase the chunk overlap if you are concerned that chunking is getting rid of important data (e.g., if a chunk cuts in the middle of a word).

  3. Embedding Model: this is the model that is used to embed the data into the knowledge base / vector database. This is defaulted to "text-embeedding-3-small" which is the state of the art model today in terms of performance and speed.

  4. Hybrid Search: Hybrid search allows you to control the tradeoff between dense (semantic) and lexical (keyword) search. You may want to use hybrid search if specific keywords are relevant in your knowledge base and you want to emphasize the return of chunks containing those keywords if referenced by the user.

  1. Add any relevant data to the knowledge by clicking on "Add Documents" and selecting the associated data loader (e.g., Integration for data from an integration like Google Drive, URL for URL contents, File for a static file (PDF, word, csv), Recursive URL for scraping all subpage associated with a URL).

Note:

  1. Integrations will be kept live synced meaning that we will auto-embed new data / changes in the files that you selected to embed

  2. URL /Recursive URL: choosing these data loaders will give you the option to re-sracape and embed the contents at predetermined time framers (to capture changes to the website).

Edit an Existing Knowledge

To edit an existing knowledge base, take the following steps:

  1. Double click on the knowledge base.

  2. Add additional data by clicking on "Add Documents".

  3. Delete a data source by clicking on the trash can icon.

  4. Integrations: change / add data from that integration by double clicking on that row.

  5. URLs: click on the gear to change the frequency that the URL is re-sraped.

Keeping Information Up to Date

While all integrations are synced at a regular interval, you can also manually trigger a sync by clicking on "Sync Integrations." VectorShift will check whether there have been any additions or updates to the files you have chosen to include in your knowledge base (s). Files that have been edited will be removed and re-embedded while new files located inside folders/directories you have chosen to include will be added as well.

Keep in mind that this task may take a while to complete, depending on the number of files, vectors, and different integrations you have chosen to include in your vector store.

Metadata Filters

To further control the results from querying knowledge bases one can enable the filter input to the Knowledge Base node.

Metadata filters allow structured filtering on the documents in a knowledge base. For each piece of information we might store relevant metadata such as the time of file, the timestamp it was created etc.

Metadata filters follow the syntax used by Qdrant, see the documentation for how to construct queries. For example we can create a filter on a "timestamp" metadata field to only query slack messages from the last hour from our knowledge base.

Last updated