By the end of this guide you’ll have a working RAG endpoint: a Knowledge Base full of your documents, plus a conversational Agent that retrieves from it on every turn and answers the user with proper citations.Documentation Index
Fetch the complete documentation index at: https://docs.vectorshift.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites. Installed SDK · API key set · Python 3.10+. About 15 minutes.
What you’ll build
Create the Knowledge Base
KnowledgeBase.new takes an embedding model and an IndexingConfig. The SplitterMethod selector tells the indexer how to chunk — MARKDOWN is a good default for docs.KnowledgeBase.new for every option.Ingest documents
Two flavours: files (
add_files / add_files_and_wait) and URLs (add_urls / add_urls_and_wait). Both have a fire-and-forget mode (returns an IngestionTask immediately) and a blocking …_and_wait variant that polls until COMPLETED.Smoke-test retrieval
Verify retrieval works before plugging the KB into an agent. If the top results look wrong, retrieval is your problem — fix it here before wiring up the agent. Common culprits: too-small
kb.query(...) returns a QueryResult with .chunks, .citations, and an optional .answer.chunk_size, wrong splitter, missing filters.Build a conversational agent with the KB as a tool
AgentTools.knowledge_base(id=kb.id, ...) is the catalogue entry that gives any Agent native semantic retrieval over your KB. Plug it into a conversational agent with session memory and you get RAG out of the box — the model decides when to retrieve, formats context for itself, and emits citations.rerank_documents=True runs a cross-encoder over the top-k hits to push the most relevant chunks to the top; format_context_for_llm=True returns the retrieval result pre-templated so the model can cite it directly. Both are off by default — turn them on for production RAG.Run multi-turn through a session
Conversational agents run inside a The agent will call
Session. Use it as an async context manager, send() user turns, and listen() for MESSAGE_DELTA / MESSAGE_COMPLETE to stream the reply.product_docs whenever it needs to ground a claim. Retrieved chunks appear in event.delta as <vs-cite item='…'/> tags inlined into the reply text — render those however your UI needs.Observe the retrieval (optional)
To see exactly when the agent retrieves, drop the Each
event_types filter so the loop also receives TOOL_CALL and TOOL_RESULT events.TOOL_CALL is one retrieval round-trip. If you see zero, the model decided it didn’t need to retrieve — usually because the question doesn’t require KB context, not a bug.Operational tips
- Reindex on schedule. For URL sources, set
rescrape_frequencytoRescrapeFrequency.WEEKLY(orDAILY) so the KB stays current automatically. - Filter at query time. Pass
filters=[FilterClause(field="team", op=FilterOperator.EQ, value="hr")]onkb.queryto scope retrieval directly; the agent-tool path uses the KB’s own search config (turn onenable_filterif you want the agent to set filters itself). - Watch for
KbIngestionFailed/KbIngestionTimeouton ingest. Most failures are oversized files or unsupported MIME types —final.statuswill beFAILEDandfinal.errorwill tell you why. - Make
product_docsmandatory. The defaultapproval_configforknowledge_basetools isAUTO_RUNso the agent retrieves without asking. If you’d rather force every query through retrieval, instruct the model explicitly: “Always call product_docs before answering.”
What’s next
Customer support bot
Add more tools (web search, approvals) on top of this agent.
RAG pipeline example
The pipeline-shaped alternative (no agent, no session).
KnowledgeBase reference
All ingest + query options.
