Knowledge

Create a permanent embedded database for frequently reused data.

The Knowledge menu allows users to convert and store data as vector embeddings, representing the stored data's meaning. This representation enables users to ensure that they are always using the most relevant data based on a query.

You can store a wide range of data in a knowledge, including files, scraped URLs, GitHub repositories, live-synced data from integrations (e.g., Google Drive, One Drive, Notion, Airtable), and more. Each data source is converted into a specified number of vectors (depending on length).

Create a New Knowledge

To create a new Knowledge, take the following steps:

  1. Navigate to the Storage tab and click "New."

  2. Within the pop-up, fill in the Name and Description.

Under "Advanced Settings," you can choose the chunk size, overlap, hybrid search, and embedding model:

  • Chunk size: When the data is loaded into the knowledge, the database will chunk it or cut it into pieces. Here, you control the size of each chunk. The chunk size is defaulted to 400 tokens or 1,600 characters (4 characters = 1 token). Decreasing the chunk size sometimes can be a remedy for returning more relevant information to an LLM (if a chunk is too large, the LLM can get confused with a large amount of data it has to reason with).

  • Chunk overlap: the chunk overlap is the number of tokens overlapping between chunks. This is defaulted to 0 tokens. Increase the chunk overlap if you are concerned that chunking eliminates essential data (e.g. if a chunk cuts in the middle of a word).

  • Processing Model: Select a processing model that suits your needs. The "Default" option is typically used for general purposes, or you can choose Llama Parse for 0.3 cents per page.

  • Functionality: Enter your Apify API key to integrate Apify services for web scraping or data extraction.

  • Embedding Model: This is used to embed the data into the knowledge/vector database. It defaults to "text-embedding-3-small," the state-of-the-art model today in terms of performance and speed. Alternatively, you can choose "embed-multilingual-v3.0" for multi-language documents.

  • Advanced Document Analysis (Beta): Enable this feature for enhanced document analysis. Activating this option may incur additional LLM (Large Language Model) usage charges.

  1. You can add any relevant data to the knowledge by clicking on "Add Documents" and selecting the associated data loader (e.g., Integration for data from an integration like Google Drive, URL for URL contents, File for a static file (PDF, Word, CSV), or Recursive URL for scraping all subpages associated with a URL).

Other Options:

  • Add New Integrations: Integrations will be kept live synced, meaning we will auto-embed new data/changes in the files you selected to embed.

  • URL /Recursive URL: choosing these data loaders will allow you to re-scrape and embed the contents at predetermined time framers (to capture changes to the website).

  1. After adding documents, you can see how the embedding is saved as knowledge by clicking the preview button.

    Document chunk preview

Edit an Existing Knowledge

To edit an existing knowledge base, take the following steps:

  1. Double-clickDouble click on the knowledge base.

  2. Add additional data by clicking on "Add Documents."

  3. Delete a data source by clicking on the trash can icon.

  4. Integrations: change / add data from that integration by double-clicking on that row.

  5. URLs: click on the gear to change the frequency when the URL is re-scraped.

Keeping Information Up to Date

While all integrations are synced regularly, you can manually trigger a sync by clicking "Sync Integrations." VectorShift will check whether there have been any additions or updates to the files you have chosen to include in your knowledge base(s). Files that have been edited will be removed and re-embedded, while new files located inside folders/directories you have chosen to include will also be added.

Remember that this task may take a while to complete, depending on the number of files, vectors, and different integrations you have chosen to include in your vector store.

Knowledge Configurations

You can configure some parameters that affect how the knowledge works to control the results of further querying it.

  • Enable Filter: Activate filters to refine search results based on specific criteria, enhancing the relevance of retrieved information. Filters allow structured filtering of the documents in a knowledge base. For each piece of information, we might store relevant metadata such as the time of the file, the timestamp it was created, etc.

    Metadata filters follow the syntax used by Qdrant; see the documentation for how to construct queries. For example, we can create a filter on a "timestamp" metadata field to query only Slack messages from our knowledge base from the last hour.

  • Rerank Documents: Enable this option to reorder documents based on their relevance to the query, improving the accuracy of responses.

  • Retrieval Unit: Choose the unit of retrieval, such as "chunks," to specify how data is segmented and accessed during queries.

  • Do NL Metadata Query: Perform natural language queries on metadata to enhance search capabilities and result precision.

  • Transform Query: Enable query transformation to modify user inputs for better alignment with the knowledge base structure.

  • Answer Multiple Questions: This allows the system to address multiple questions in a single query, improving efficiency and user satisfaction.

  • Expand Query: Automatically expand queries to include related terms and concepts, broadening the scope of search results.

  • Do Advanced QA (Beta): Use advanced question-answering techniques to provide more detailed and accurate responses. Note that this may incur additional charges.

  • Show Intermediate Steps: Display the intermediate steps taken during query processing for transparency and debugging purposes.

Last updated