Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vectorshift.ai/llms.txt

Use this file to discover all available pages before exploring further.

Lifecycle

new

KnowledgeBase.new(
    name: str,
    embedding_model: Optional[EmbeddingModel] = None,
    indexing_config: Optional[IndexingConfig] = None,
) -> KnowledgeBase
Create a new knowledge base. Parameters
name
str
required
Display name of the knowledge base.
embedding_model
Optional[EmbeddingModel]
default:"None"
One of the strings listed in EmbeddingModel. Defaults to the platform default if omitted.
indexing_config
Optional[IndexingConfig]
default:"None"
Chunking + analysis options. See IndexingConfig.
Returns
returns
KnowledgeBase
The new KnowledgeBase instance.

fetch

KnowledgeBase.fetch(
    id: Optional[str] = None,
    name: Optional[str] = None,
) -> KnowledgeBase
Fetch by id or name. Exactly one is required.

list

KnowledgeBase.list(limit: int = 20, offset: int = 0) -> list[KnowledgeBase]
Paginated list of the KBs visible to the API key’s org.

delete

KnowledgeBase.delete() -> None
Permanently delete the KB and every item in it.

Folders

create_folder

KnowledgeBase.create_folder(
    name: str,
    parent_folder_id: Optional[str] = None,
) -> KbFolder
Folders organise items inside a KB. Pass a folder_id to any add_* method to ingest into a specific folder, or use move_items to reorganise afterwards.

Ingestion

Every ingestion call returns an IngestionTask. The _and_wait variants poll until the task reaches COMPLETED or FAILED; the bare variants return immediately and let you drive the polling yourself via ingestion_status.

add_files

KnowledgeBase.add_files(
    files: Sequence[FileLike],
    folder_id: Optional[str] = None,
) -> IngestionTask
Ingest one or more local files. Each entry can be a Path, raw bytes, or a file-like object.

add_files_and_wait

KnowledgeBase.add_files_and_wait(
    files: Sequence[FileLike],
    folder_id: Optional[str] = None,
    timeout: float = 300,
) -> IngestionTask
Blocking variant of add_files. Polls until the task reaches a terminal status or timeout elapses. Raises KbIngestionTimeout on timeout, KbIngestionFailed if the task ends in FAILED.

add_urls

KnowledgeBase.add_urls(
    urls: list[str],
    url_config: Optional[UrlConfig] = None,
    folder_id: Optional[str] = None,
) -> IngestionTask
Ingest URLs, optionally with a recursive crawl and a rescrape schedule. See UrlConfig.

add_urls_and_wait

KnowledgeBase.add_urls_and_wait(
    urls: list[str],
    url_config: Optional[UrlConfig] = None,
    folder_id: Optional[str] = None,
    timeout: float = 600,
) -> IngestionTask
Blocking variant of add_urls.

add_folder / add_folder_and_wait

KnowledgeBase.add_folder(
    folder_path: Union[str, Path],
    folder_id: Optional[str] = None,
) -> IngestionTask
Walk a local directory and ingest every file. _and_wait blocks until done (default timeout=600).

add_tables / add_tables_and_wait

KnowledgeBase.add_tables(
    tables: Sequence[TableLike],
    folder_id: Optional[str] = None,
) -> IngestionTask
Ingest structured table rows. _and_wait variant blocks until done.

ingestion_status

KnowledgeBase.ingestion_status(task_id: str) -> IngestionTask
Fetch the latest state of an ingestion task. Use after a bare add_* call to drive your own polling loop.

Querying

query

KnowledgeBase.query(
    text: str,
    *,
    top_k: int = 10,
    filters: Optional[list[FilterClause]] = None,
    sort: Optional[SortClause] = None,
    hybrid: Optional[HybridConfig] = None,
    rerank: Optional[RerankConfig] = None,
    qa: Optional[QaConfig] = None,
    config: Optional[QueryConfig] = None,
) -> QueryResult
Retrieve relevant chunks for the query. Parameters
text
str
required
The query string.
top_k
int
default:"10"
Maximum number of chunks to return (post-rerank if rerank= is set).
filters
Optional[list[FilterClause]]
default:"None"
Metadata filters. See FilterClause.
sort
Optional[SortClause]
default:"None"
Sort order over results. See SortClause.
hybrid
Optional[HybridConfig]
default:"None"
Enable hybrid (vector + keyword) retrieval. See HybridConfig.
rerank
Optional[RerankConfig]
default:"None"
Cross-encoder rerank over the top candidates. See RerankConfig.
qa
Optional[QaConfig]
default:"None"
Run an LLM over the retrieved context and return a .answer. See QaConfig.
config
Optional[QueryConfig]
default:"None"
Pass a pre-built QueryConfig instead of individual kwargs. Cannot be combined with other tuning kwargs.
Returns
returns
QueryResult
Typed QueryResult with .chunks, .citations, optional .answer, optional .usage.

Items

list_items

KnowledgeBase.list_items(
    folder_id: Optional[str] = None,
    limit: int = 50,
    filters: Optional[list[FilterClause]] = None,
    sort: Optional[SortClause] = None,
    cursor: Optional[str] = None,
) -> list[KbItem]
Paginated listing of items, optionally filtered and sorted.

scroll

KnowledgeBase.scroll(
    page_size: int = 100,
    filters: Optional[list[FilterClause]] = None,
    folder_id: Optional[str] = None,
    sort: Optional[SortClause] = None,
) -> Iterator[list[KbItem]]
Stream through every item in pages — useful for large KBs where list_items would require manual cursor management.

get_item

KnowledgeBase.get_item(item_id: str) -> KbItem

move_items

KnowledgeBase.move_items(item_ids: list[str], target_folder_id: str) -> None
Move items into a folder (created via create_folder).

delete_items

KnowledgeBase.delete_items(item_ids: list[str]) -> None

update_item_metadata

KnowledgeBase.update_item_metadata(
    item_id: str,
    custom_metadata: dict[str, Any],
) -> KbItem
Attach or replace custom metadata on an item. Filter against this metadata at query time via FilterClause.

Accessors

A KB exposes three accessor objects for scoped operations. They keep the top-level surface lean by routing item / folder / metadata-autogen calls through dedicated handles.

item

KnowledgeBase.item(item_id: str) -> KnowledgeBaseItemRef
Return a lightweight reference to an item. The ref carries the KB id + item id; use it as a typed alternative to passing raw item ids around.

folder

KnowledgeBase.folder(folder_id: str) -> KnowledgeBaseFolderRef
Return a lightweight reference to a folder. Use the ref to scope subsequent operations (e.g. pass it as folder_id= to ingestion calls or list_items).

metadata_autogen

KnowledgeBase.metadata_autogen -> MetadataAutogenAccessor  # property
Accessor for the metadata-autogeneration subsystem — extract structured metadata (summaries, topics, custom fields) from items via LLM-driven pipelines. A KB can hold many autogen configs, each addressed by an autogen_id.

create_config

kb.metadata_autogen.create_config(
    *,
    extraction_instructions: str,
    query_instructions: str,
    traversal_type: str = "document",
    debounce_seconds: int = 0,
    name: Optional[str] = None,
    document_config: Optional[dict] = None,
    folder_config: Optional[dict] = None,
    chunk_config: Optional[dict] = None,
    filters: Optional[dict] = None,
) -> MetadataAutogenConfigRecord
Register a new autogen config. extraction_instructions tells the LLM what to extract from each item; query_instructions controls how that metadata is matched at retrieval time. traversal_type is one of "document", "folder", or "chunk".

list_configs

kb.metadata_autogen.list_configs() -> list[MetadataAutogenConfigRecord]

get_config

kb.metadata_autogen.get_config(autogen_id: str) -> MetadataAutogenConfigRecord

replace_config

kb.metadata_autogen.replace_config(
    autogen_id: str,
    *,
    extraction_instructions: str,
    query_instructions: str,
    traversal_type: str = "document",
    debounce_seconds: int = 0,
    name: Optional[str] = None,
    document_config: Optional[dict] = None,
    folder_config: Optional[dict] = None,
    chunk_config: Optional[dict] = None,
    filters: Optional[dict] = None,
) -> MetadataAutogenConfigRecord
Full-overwrite update — every field omitted resets to its default server-side. Pre-fetch with get_config and merge if you only want to change one field.

delete_config

kb.metadata_autogen.delete_config(autogen_id: str) -> None

run

kb.metadata_autogen.run(autogen_id: str, item_ids: list[str]) -> IngestionTask
Trigger autogen against a subset of items. Returns an IngestionTask you can poll, or use the wait variant below.

run_and_wait

kb.metadata_autogen.run_and_wait(
    autogen_id: str,
    item_ids: list[str],
    timeout: float = 300,
) -> IngestionTask
See the metadata-autogen example for the full lifecycle.

Integrations

integrations

KnowledgeBase.integrations() -> list[KbIntegration]
List integrations attached to the KB. Integration setup happens in the platform.

resync_integration

KnowledgeBase.resync_integration(integration_id: str) -> None
Trigger a refresh against the upstream source.

set_rescrape_frequency

KnowledgeBase.set_rescrape_frequency(
    *,
    item_id: Optional[str] = None,
    integration_id: Optional[str] = None,
    frequency: RescrapeFrequency,
) -> None
Set a periodic re-scrape on a URL item or an integration. Pass exactly one of item_id / integration_id. See RescrapeFrequency.

Types

IndexingConfig

chunk_size
Optional[int]
default:"None"
chunk_overlap
Optional[int]
default:"None"
analyze_documents
bool
default:"False"
Run document-level analysis during indexing (slower, richer metadata).
splitter
Optional[SplitterMethod]
default:"None"
segmentation
Optional[SegmentationMethod]
default:"None"
index_tables
bool
default:"False"
enrichment_tasks
Optional[list[str]]
default:"None"
file_processing_implementation
str
default:"'Default'"
apify_key
Optional[str]
default:"None"

UrlConfig

recursive
bool
default:"False"
When True, crawl outbound links recursively up to url_limit.
url_limit
Optional[int]
default:"None"
Maximum number of URLs to fetch in a recursive crawl.
ai_enhance_content
Optional[bool]
default:"None"
return_type
str
default:"'CHUNKS'"
rescrape_frequency
RescrapeFrequency
default:"NEVER"
apify_key
Optional[str]
default:"None"

QueryConfig

Pre-built bundle of every option query(...) accepts as kwargs. Pass config= instead of individual kwargs when you want to reuse a config across queries.
top_k
int
default:"10"
filters
Optional[list[FilterClause]]
default:"None"
sort
Optional[SortClause]
default:"None"
hybrid
Optional[HybridConfig]
default:"None"
rerank
Optional[RerankConfig]
default:"None"
qa
Optional[QaConfig]
default:"None"
retrieval_unit
Optional[RetrievalUnit]
default:"None"
score_cutoff
Optional[float]
default:"None"
context
Optional[str]
default:"None"

HybridConfig

alpha
float
default:"0.5"
Weight between vector (1.0) and keyword (0.0) scoring.
fusion_method
Optional[str]
default:"None"

RerankConfig

model
str
required
Reranker identifier (e.g. "bge-reranker-v2-m3", "cohere/rerank-english-v3.0").
top_n
Optional[int]
default:"None"
Number of chunks to keep after rerank.
adaptive_cutoff
Optional[bool]
default:"None"

QaConfig

mode
QaMode
default:"OFF"
See QaMode.
citations
bool
default:"True"
Whether to include Citations in the response.
response_format
Optional[str]
default:"None"
llm_provider
Optional[str]
default:"None"
llm_model
Optional[str]
default:"None"

FilterClause

field
str
required
Metadata field name to filter on.
op
FilterOperator
required
value
Any
required

SortClause

field
str
required
direction
SortDirection
default:"ASC"

IngestionTask

The handle returned by every add_* call. Reaches a terminal status (COMPLETED, FAILED, or WARNING) once indexing finishes.
task_id
str
required
kb_id
str
required
status
IngestionStatus
required
item_ids
list[str]
default:"[]"
The new item ids once the task completes.
error
Optional[str]
default:"None"
failed_uploads
list[FailedUpload]
default:"[]"

QueryResult

The typed return of kb.query(...).
query_id
str
required
chunks
list[RetrievedChunk]
default:"[]"
citations
list[Citation]
default:"[]"
answer
Optional[str]
default:"None"
LLM-generated answer when qa=QaConfig(mode=...) is set.
usage
Optional[UsageInfo]
default:"None"

RetrievedChunk

item_id
str
required
chunk_id
str
required
score
float
required
text
str
required
unit
RetrievalUnit
required
metadata
Optional[dict[str, Any]]
default:"None"

Enums

EmbeddingModel

String literal. Supported values:
  • "text-embedding-3-large"
  • "text-embedding-3-small"
  • "text-embedding-ada-002"
  • "embed-v4.0"
  • "voyage-3" · "voyage-4" · "voyage-multimodal-3" · "voyage-code-3"
  • "google-text-embedding-004" · "google-text-embedding-005"
  • "together-bge-large-en-v1.5" · "together-m2-bert-80M-8k-retrieval"

SplitterMethod

MARKDOWN · SENTENCE · DYNAMIC · CODE

RescrapeFrequency

NEVER · QUARTER_HOURLY · HOURLY · DAILY · WEEKLY · MONTHLY

QaMode

OFF · FAST · ACCURATE

FilterOperator

EQ · NE (NEQ) · CONTAINS · IN · NOT_IN (NIN) · GT · GTE · LT · LTE

IngestionStatus

PENDING · IN_PROGRESS · COMPLETED · FAILED · WARNING

SortDirection

ASC · DESC

Errors

The KB module raises a small set of typed errors. Catch them by name; all subclass KnowledgeBaseError, which in turn subclasses VectorshiftError.
  • KbNotFound — KB id/name doesn’t exist (or your key can’t see it).
  • KbIngestionFailed — an ingestion task ended in FAILED.
  • KbIngestionTimeoutadd_*_and_wait exceeded its timeout.
  • KbIntegrationNotFound, KbIntegrationRevoked — integration resolution problems.
See the top-level Errors page for the broader hierarchy.

What’s next

Overview

Mental model and quick start.

RAG end-to-end guide

Wrap a KB as a tool on a conversational Agent.

RAG pipeline example

Compose a KB reader into a Pipeline.