Skip to main content
The KnowledgeBase class is the SDK surface for VectorShift’s managed retrieval store. Ingest files, URLs, folders, tables, or third-party integrations; query them with vector + keyword search, hybrid fusion, rerank, and optional QA — directly from Python, or via an Agent tool / a Pipeline node.
Prerequisites: Installed SDK · API key set · Python 3.10+.

Mental model

  • A KB is a named collection of items (files, URLs, table rows, integration records). Each item is chunked, embedded, and indexed once on ingest.
  • Ingestion is task-based: every add_files / add_urls / add_folder / add_tables call returns an IngestionTask you can poll, or use the _and_wait variant which blocks until COMPLETED.
  • Querying is a single surfacekb.query("text", top_k=…, filters=…, hybrid=…, rerank=…, qa=…). Pass kwargs or a single QueryConfig, never both. Returns a typed QueryResult.
  • Every method has an async variant (anew, aadd_files, aquery, ascroll, …).
Where Knowledge Bases live in your code. A KB is rarely the endpoint — most production deployments expose it through one of two paths: as an AgentTools.knowledge_base(id=kb.id, …) tool on a conversational Agent (the RAG-with-Agent pattern; see the RAG guide), or as a KnowledgeBaseNode(knowledge_base=kb, …) inside a Pipeline (the RAG-pipeline pattern; see the rag-pipeline example).

Quick start

from vectorshift import KnowledgeBase
from vectorshift.knowledge_base import (
    IndexingConfig, SplitterMethod, UrlConfig, RescrapeFrequency,
)

# Create the KB with an embedding model and chunking config.
kb = KnowledgeBase.new(
    name="product-docs",
    embedding_model="text-embedding-3-small",
    indexing_config=IndexingConfig(
        chunk_size=500, chunk_overlap=50, splitter=SplitterMethod.MARKDOWN,
    ),
)

# Ingest a URL. add_urls_and_wait blocks until indexing completes.
kb.add_urls_and_wait(
    urls=["https://en.wikipedia.org/wiki/Retrieval-augmented_generation"],
    url_config=UrlConfig(
        recursive=False, rescrape_frequency=RescrapeFrequency.NEVER,
    ),
    timeout=240,
)

# Query — first arg is the query string, options are kwargs.
res = kb.query("What is RAG?", top_k=5)
print(f"{len(res.chunks)} chunks")
print(res.chunks[0].text[:200])

How to use a Knowledge Base

Direct queryVia an AgentVia a Pipeline
Surfacekb.query("text", …)AgentTools.knowledge_base(id=kb.id, …) on Agent.new(tools=[…])KnowledgeBaseNode(knowledge_base=kb, …) inside Pipeline.new(...)
OutputTyped QueryResult.chunks, .answer, .citationsStreamed MESSAGE_DELTA events with <vs-cite> tags inlineA pipeline node output (.formatted_text, etc.) you wire into downstream nodes
Use whenBuilding your own retrieval logic; offline scoring; smoke-testing the KBConversational RAG — the model decides when to retrieve, emits citations, supports multi-turn memoryDeterministic graphs where retrieval is one fixed step (RAG-pipeline, chatbots, batch jobs)
GuideThis page’s Quick startRAG end-to-endrag-pipeline example

Ingestion sources

MethodWhat it ingestsWait helper
add_filesLocal files (Path or bytes-likes)add_files_and_wait
add_urlsURLs, with optional recursive crawl and rescrape schedule via UrlConfigadd_urls_and_wait
add_folderAn entire directory treeadd_folder_and_wait
add_tablesStructured table rowsadd_tables_and_wait
Integrations (Slack, Google Drive, …)Configured in the platform, then resync_integration refreshes them
All methods return an IngestionTask with .task_id, .status, .item_ids, and (on failure) .error / .failed_uploads. The _and_wait variants poll until terminal status; the bare ones return immediately and let you poll via ingestion_status(task_id).

Recent additions

The KB surface was overhauled: ingestion is now task-based (add_files / add_urls / add_folder / add_tables + _and_wait variants), kb.query(...) returns a typed QueryResult with .chunks / .citations / .answer, and querying takes either kwargs (top_k, filters, hybrid, rerank, qa) or a single QueryConfig. Items can be enumerated and filtered via list_items / scroll and re-organised via create_folder / move_items / update_item_metadata.

What’s next

Reference

Every public method, grouped by topic.

RAG end-to-end guide

Wrap a KB as a tool on a conversational Agent.

RAG pipeline example

Compose a KB reader into a Pipeline.