Knowledge Bases

Interact with Knowledge Bases through Python classes.

Knowledge Bases are a type of database/storage system offered by the VectorShift platform that allows you to store various kinds of data, such as text, files, and scraped URLs, into (one or more) vector embeddings that represent the meaning of the shared data. We suggest reading the platform documentation on Knowledge Bases to gain an appropriate context.

The SDK offers an interface atop the API endpoints through a class to easily fetch, manipulate, and save Knowledge Base. Since some methods interface with the VectorShift platform, they require API keys. If the API keys have already been set as environment variables, they need not be supplied in those methods. The output of all methods invoking APIs is a dictionary representing the API JSON response.

Creating Knowledge Bases

Represents a Knowledge Base object that may be existing (if an ID is given) or new (if no ID is given).

from vectorshift.knowledge_base import KnowledgeBase

def KnowledgeBase(
    name: str,
    description: str = '',
    chunk_size: int = 400,
    chunk_overlap: int = 0,
    is_hybrid: bool = False,
    id: str = None,
)

Note: To work with existing Knowledge Base objects, we suggest using thefetch method instead.

Arguments:

  • name: The name of the Knowledge Base.

  • description: The description of the Knowledge Base.

  • chunk_size: The chunk size of the Knowledge Base (the default size, in bytes, of each unit of information uploaded).

  • chunk_overlap: The chunk overlap of the Knowledge Base (the default number of bytes of overlap between each unit of information uploaded). For instance, if the total data size is 1000 tokens, a size of 500 and overlap of 0 will give 2 documents (tokens 1-500, 501-1000), while an overlap of 250 gives 3 (tokens 1-500, 250-750, 501-1000).

  • is_hybrid: Whether the Knowledge Base supports hybrid search. Hybrid search allows you to control the tradeoff between dense (semantic) and lexical (keyword) search.

  • id:(Optional) The ID of the Knowledge Base, which, if given, should correspond with a Knowledge Base you already own on the VectorShift platform. If blank, it represents a new Knowledge Base.

from vectorshift.knowledge_base import KnowledgeBase

knowledge = KnowledgeBase(name="Vectorshift Chatbot", description="Knowledge Base for SDK related questions")
knowledge.save()

fetch

A static method that creates a KnowledgeBase object representing an existing Knowledge Base on the VectorShift platform, given an ID or name.

def fetch(
     base_id: str = None,
     base_name: str = None,
     username: str = None,
     org_name: str = None,
     api_key: str = None,
     public_key: str = None,
     private_key: str = None,
)

Arguments:

  • base_id: The ID of the Knowledge Base to fetch.

  • base_name: The ID of the Knowledge Base being represented. At least one of base_id or base_name should be provided. If both are provided, base_name is used to search for the Knowledge Base.

  • username: (Optional) The username of the user owning the Knowledge Base.

  • org_name: (Optional) The organization name of the user owning the Knowledge Base, if applicable.

  • api_key: The VectorShift API key.

save

A method to save or update a KnowledgeBase object to the VectorShift platform.

def save(update_existing: bool = False, api_key: str = None) 

Arguments:

  • update_existing: Whether or not to save the Knowledge Base as a new object or replace an existing one. If set to True, the KnowledgeBase should have an ID, and the existing Knowledge Base with the ID will be replaced with the object's data. If set to False, the ID, if any, is ignored and a new Knowledge Base object is created with the object's data.

  • api_key: The VectorShift API key.

update_metadata

A method to update metadata fields for items in a Knowledge Base. The KnowledgeBase object should already have an ID.

def update_metadata(
    list_of_item_ids: list[str],
    list_of_metadata: list[str],
    keep_prev: bool,
    api_key: str = None,
    public_key: str = None,
    private_key: str = None,
) 

Arguments:

  • list_of_item_ids: The IDs of the items whose metadata is to be updated.

  • list_of_metadata: The new metadata for all items. It should have the same length as list_of_item_ids. For each i, the i-th element of list_of_metadata will be the new metadata for the item identified by the ith element of list_of_item_ids.

  • keep_prev: Whether or not to replace or update the existing metadata for each item. If set to True, additional metadata is added to existing metadata. If set to False, the old metadata is discarded.

  • api_key: The VectorShift API key.

  • public_key: The public key to use for authentication, if applicable.

  • private_key: The private key to use for authentication, if applicable.

update_selected_files

A method to update files associated with an integration. The KnowledgeBase object should already have an ID.

def update_selected_files(
    integration_id: str,
    keep_prev: bool,
    selected_items: Optional[list[str]] = None,
    select_all_items_flag: Optional[bool] = True,
    api_key: str = None,
)

Arguments:

  • integration_id: The ID of the specific integration with associated files in the Knowledge Base. Only files associated with this integration will be updated.

  • keep_prev: Whether or not to keep the previous versions of the files. If set to True, additional files are added separately.

  • selected_items: Names of the specific files to update.

  • select_all_items_flag: If this flag is True, all files associated with the integration are updated.

  • api_key: The VectorShift API key.

sync

A method to sync the Knowledge Base object with the VectorShift platform (such that the object reflects the most up-to-date version of the Knowledge Base from the platform). The KnowledgeBase object should already have an ID.

def sync(
    api_key: str = None,
)

Arguments:

  • api_key: The VectorShift API key.

load_documents

A method to add/load a new document into the Knowledge Base, or add files associated with an integration. The KnowledgeBase object should already have an ID.

def load_documents(
    document,
    document_name: str = None,
    document_type: str = 'File',
    chunk_size: int = None,
    chunk_overlap: int = None,
    selected_items: list = None,
    select_all_items_flags: list = None,
    metadata: dict = None,
    metadata_by_item: dict = None,
    api_key: str = None,
)

Arguments:

  • document: The document to load into the Knowledge Base. It should correspond with the value of document_type; see below.

  • document_name: The name of the document.

  • document_type: The type of document provided. It should be one of the following options:

    • "File": Loads a file. document should be the path to a file.

    • "Integration": Represents an integration. All files from the integration are loaded. document should be JSON data for the integration.

    Otherwise, document is treated as text.

  • chunk_size: The maximum size of vectors that this document will be split into (if the document must be split into multiple vectors)

  • chunk_overlap: The striding of vectors that this document will be split into (if the document must be split into multiple vectors).

  • selected_items: Used when document_type is "Integration". Lists the names of the specific files associated with the integration to update.

  • select_all_items_flag: Used when document_type is "Integration". If this flag is True, all files associated with the integration are updated.

  • metadata: General metadata to be added to (each of) the new document(s),

  • metadata_by_item: Used to add metadata to specific documents. Should be a dictionary of file names to document-specific metadata.

  • api_key: The VectorShift API key.

query

A method to query the Knowledge Base for specific documents. Returns a JSON response with the documents.

def query(
    query: str, 
    max_docs: int = 5,
    filter: dict = None, 
    rerank: bool = False, 
    api_key: str = None, 
)

Arguments:

  • query: A string query to the Knowledge Base.

  • max_docs: The maximum number of documents to be returned from the query.

  • filter: Additional filters. Only documents whose metadata contains the key-value pairs specified in filter will be returned.

  • rerank: Whether or not to rerank documents upon retrieval.

  • api_key: The VectorShift API key.

list_documents

def list_documents(
    max_documents: int = 5,
    api_key: str = None, 
)

A method to list all existing documents in the Knowledge Base. Returns a JSON representation of the documents.

Arguments:

  • max_docs: The maximum number of documents to be returned.

  • api_key: The VectorShift API key.

delete_documents

def delete_documents(
    document_id: list[str],
    filter: dict = None,
    api_key: str = None, 
)

A method to delete a document by ID from the Knowledge Base. We are currently in the process of building out this functionality.

Arguments:

  • document_id: The ID of the document to delete. For now, this should be a singleton list containing the ID.

  • filter: Forthcoming filter for documents to delete. Currently has no usage.

  • api_key: The VectorShift API key.

share

def share(
    shared_users: list[str], 
    api_key: str = None, 
)

A method to share a Knowledge Base object with one or more emails.

Arguments:

  • shared_users: A list of emails to share the Knowledge Base with.

  • api_key: The VectorShift API key.

fetch_shared

def fetch_shared(
    api_key: str = None, 
)

A method that returns a list of all emails with which the Knowledge Base is shared.

Arguments:

  • api_key: The VectorShift API key.

remove_share

def remove_share(
    users_to_remove: list[str],
    api_key: str = None, 
)

A method to remove sharing from one or more emails.

Arguments:

  • shared_users: A list of emails from which to remove sharing.

  • api_key: The VectorShift API key.

Last updated