> ## Documentation Index > Fetch the complete documentation index at: https://docs.vectorshift.ai/llms.txt > Use this file to discover all available pages before exploring further. # Knowledge & retrieval nodes > Query, build, and sync knowledge bases. Add these nodes with the pipeline builder: `pipeline.add(name="...").(...)`. Each entry lists the node's configuration parameters. See the [Pipeline reference](/sdk/pipeline/reference) for `add`, `run`, and lifecycle methods. ## `chunking` Split text into chunks. Supports different chunking strategies like markdown-aware, sentence-based, or dynamic sizing. ```python Sync theme={"languages":{}} pipeline.add(name="node").chunking() ``` **Parameters** Strategy for grouping segmented text into final chunks. 'sentence': groups sentences; 'markdown': respects Markdown structure (headers, code); 'dynamic': optimizes breaks for size using chosen segmentation method. One of: `dynamic`, `markdown`, `sentence` The text to chunk The overlap of each chunk of text. The size of each chunk of text. The method to break text into units before chunking. 'words': splits by word; 'sentences': splits by sentence boundary; 'paragraphs': splits by blank line/paragraph. One of: `paragraphs`, `sentences`, `words` ## `create_workspace` Create a new workspace in a portal, upload files to its knowledge base, and share with users Platform docs: [Create a new workspace in a portal, upload files to its knowledge base, and share with users](/nodes/create-workspace/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").create_workspace(name="...", portal=...) ``` **Parameters** ## `knowledge_base` Semantically query a knowledge base that can contain files, scraped URLs, and data from synced integrations (e.g., Google Drive). Platform docs: [Semantically query a knowledge base that can contain files, scraped URLs, and data from synced integrations (e.g., Google Drive).](/nodes/knowledge-base-v3/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base(query="...", knowledge_base=...) ``` **Parameters** Use additional LLM calls to analyze each document to improve answer correctness Filter the content returned from the knowledge base. Agents should provide structured metadata filters directly in the filter input when useful. Enable context Format the context for the LLM Enable the document DB filter Generate an LLM response from the retrieved context Whether to stream the LLM response The query will be used to search documents for relevant content semantically. Must not be empty, only include relevant information for retrieval or metadata filter generation. Generally expand any specific acronyms or abbreviations but include the original acronym or abbreviation as well Additional context to pass to the query analysis and qa steps Filter the documents returned from the knowledge base Structured metadata filter JSON for the knowledge base query. Use a top-level boolean clause such as \{"type":"condition","field":"title","operator":"match","value":"Q4 report"}; leave empty when no hard metadata constraint is needed. Use an LLM to generate metadata filters to refine your query. Agents should usually leave this false and provide filters directly in the filter input. Select an existing knowledge base, Use \$object.knowledge\_base.? syntax The system prompt to use for the LLM The alpha value for the retrieval. 1.0 is pure vector search and 0.0 is pure lexical search Extract separate questions from the query and retrieve content separately for each question to improve search performance Do a natural language metadata query Expand query to improve semantic search Expand query terms to improve semantic search The number of chunks to rerank Rerank the documents returned from the knowledge base Refine the initial ranking of returned chunks based on relevancy `cohere/rerank-english-v3.0`, `cohere/rerank-multilingual-v3.0`, `cohere/rerank-v3.5`, `cohere/rerank-v4.0-fast`, `cohere/rerank-v4.0-pro`, `contextualai/ctxl-rerank-en-v1-instruct`, `jina/jina-reranker-v2-base-multilingual`, `llm/google/gemini-2.5-flash`, `llm/google/gemini-2.5-flash-lite-preview-06-17`, `llm/google/gemini-2.5-pro`, `opensource/BAAI/bge-reranker-v2-m3`, `together/Salesforce/Llama-Rank-V1`, `together/mixedbread-ai/Mxbai-Rerank-Large-V2`, `voyageai/rerank-2`, `voyageai/rerank-2-lite`, `voyageai/rerank-2.5`, `voyageai/rerank-2.5-lite` The unit of retrieval. Chunks will return the most relevant chunks from the knowledge base as well as their text content. Documents will return the document metadata as well as most relevant snippets from the document. Pages will return complete pages with all chunks from pages containing relevant content One of: `chunks`, `documents`, `pages` The score cutoff The number of relevant chunks to be returned Transform the query for better semantic search The mode to use for the advanced search One of: `accurate`, `fast` The model to use for the QA `claude-3-5-sonnet-20240620`, `claude-3-haiku-20240307`, `claude-3-opus-20240229`, `gemini-2.0-flash-001`, `gemini-2.5-pro-exp-03-25`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-4o`, `gpt-4o-2024-08-06`, `gpt-4o-2024-11-20`, `gpt-4o-mini`, `gpt-4o-mini-2024-07-18`, `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `o1`, `o1-2024-12-17`, `o3`, `o3-mini`, `o3-pro`, `o4-mini` ## `knowledge_base_actions` Create, load, and sync Knowledge Bases ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_actions() ``` **Parameters** ## `knowledge_base_agent` Query a knowledge base using an agentic approach with tools. Platform docs: [Query a knowledge base using an agentic approach with tools.](/nodes/knowledge-base-agent/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_agent(query="...", knowledge_base=...) ``` **Parameters** Select the LLM provider to be used by the agent One of: `google` Controls the query effort: 'fast' for quick answers, 'focused' for balanced depth, 'deep' for thorough analysis One of: `deep`, `fast`, `focused` If enabled, shows an additional context input to provide context to the agent If enabled, returns the relevant context/chunks used to generate the answer Select the LLM model to be used by the agent One of: `gemini-3-flash-preview` The natural language query. The agent will use this to determine the best way to query the knowledge base. Include the key criteria needed to answer the query. Optional additional context to help the agent understand the query better (e.g., conversation history, user preferences) Select an existing knowledge base. You must provide the id in \$.object.knowledge\_base.id format If enabled, generates a synthesized answer from the knowledge base ## `knowledge_base_create` Dynamically create a Knowledge Base with configured options Platform docs: [Dynamically create a Knowledge Base with configured options](/nodes/create-knowledge-base/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_create(analyze_documents=True, chunk_overlap=0, chunk_size=0, collection_name="...") ``` **Parameters** Strategy for grouping segmented text into final chunks. 'sentence': groups sentences; 'markdown': respects Markdown structure (headers, code); 'dynamic': optimizes breaks for size using chosen segmentation method. One of: `advanced` To analyze document contents and enrich them when parsing Apify API Key for scraping URLs (optional) The overlap of the chunks to store in the knowledge base The size of the chunks to store in the knowledge base The name of the collection to store the knowledge base in The embedding model to use for the knowledge base. Format: provider/model `cohere/embed-english-light-v3.0`, `cohere/embed-english-v3.0`, `cohere/embed-multilingual-light-v3.0`, `cohere/embed-multilingual-v3.0`, `cohere/embed-v4.0`, `google/embedding-001`, `openai/text-embedding-3-large`, `openai/text-embedding-3-small`, `openai/text-embedding-ada-002`, `voyageai/voyage-3-large`, `voyageai/voyage-3-lite`, `voyageai/voyage-3.5`, `voyageai/voyage-3.5-lite`, `voyageai/voyage-4`, `voyageai/voyage-4-large`, `voyageai/voyage-4-lite`, `voyageai/voyage-code-3`, `voyageai/voyage-context-3`, `voyageai/voyage-finance-2`, `voyageai/voyage-multimodal-3.5` The embedding provider to use The file processing implementation to use for parsing documents One of: `contextual_ai`, `default`, `docling`, `llama_parse`, `mistral_ocr`, `reducto`, `textract` Whether to create a hybrid knowledge base The name of the knowledge base to create The precision to use for the knowledge base The method to break text into units before chunking. 'words': splits by word; 'sentences': splits by sentence boundary; 'paragraphs': splits by blank line/paragraph. One of: `paragraphs`, `sentences`, `words` Whether to shard the knowledge base The vector database provider to use ## `knowledge_base_fetch_document_content` Fetch the full content of a specific document from a knowledge base by scrolling through all its chunks Platform docs: [Fetch the full content of a specific document from a knowledge base by scrolling through all its chunks](/nodes/knowledge-base-fetch-document-content/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_fetch_document_content(item_id="...", knowledge_base=...) ``` **Parameters** ## `knowledge_base_fetch_items` Advanced knowledge base item fetching with traversal, filtering, and output shaping capabilities Platform docs: [Advanced knowledge base item fetching with traversal, filtering, and output shaping capabilities](/nodes/knowledge-base-fetch-items/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_fetch_items(knowledge_base=...) ``` **Parameters** One of: `ALL`, `DOCUMENTS`, `FOLDERS` One of: `long`, `metadata`, `short` ## `knowledge_base_get_item_bboxes` Fetch OCR bounding boxes for specific pages of a PDF document in a knowledge base ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_get_item_bboxes(item_id="...", knowledge_base=..., page_ranges="...") ``` **Parameters** ## `knowledge_base_list_items` List items (documents and folders) from a knowledge base with pagination support ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_list_items(knowledge_base=...) ``` **Parameters** ## `knowledge_base_loader` Load data into an existing knowledge base. Platform docs: [Load data into an existing knowledge base.](/nodes/knowledge-base-loader/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_loader(knowledge_base=..., url="...", documents=...) ``` **Parameters** Select the type of data to load One of: `File`, `URL` Scrape sub-pages of the provided link The knowledge base to load data into The frequency to rescrape the URL One of: `Daily`, `Monthly`, `Never`, `Weekly` The raw URL link (e.g., [https://vectorshift.ai/](https://vectorshift.ai/)) Use a proxy to crawl the website Load URLs to crawl from a sitemap. If the URL is a sitemap, it will be used directly. If the URL is not a sitemap, the sitemap will be fetched automatically. The maximum depth of the URL to crawl The maximum number of recursive URLs to scrape Whether to only crawl links from the same domain The file to be added to the selected knowledge base. Note: to convert text to file, use the Text to File node ## `knowledge_base_sync` Automatically trigger a sync to the integrations in the selected knowledge base Platform docs: [Automatically trigger a sync to the integrations in the selected knowledge base](/nodes/sync-knowledge-base/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").knowledge_base_sync(knowledge_base=...) ``` **Parameters** ## `semantic_search` Generate a temporary vector database at run-time and retrieve the most relevant pieces from the documents based on the query. Platform docs: [Generate a temporary vector database at run-time and retrieve the most relevant pieces from the documents based on the query.](/nodes/semantic-search/overview) ```python Sync theme={"languages":{}} pipeline.add(name="node").semantic_search(query="...", documents="...") ``` **Parameters** Use additional LLM calls to analyze each document to improve answer correctness Filter the content returned from the knowledge base Additional context passed to advanced search and query analysis Format the context for the LLM Filter the documents returned from the knowledge base Strategy for grouping segmented text into final chunks. 'sentence': groups sentences; 'markdown': respects Markdown structure (headers, code); 'dynamic': optimizes breaks for size using chosen segmentation method. One of: `dynamic`, `markdown`, `sentence` The model to use for the embedding `cohere/embed-english-light-v2.0`, `cohere/embed-english-light-v3.0`, `cohere/embed-english-v2.0`, `cohere/embed-english-v3.0`, `cohere/embed-multilingual-light-v3.0`, `cohere/embed-multilingual-v2.0`, `cohere/embed-multilingual-v3.0`, `google/embedding-001`, `intfloat/multilingual-e5-large`, `openai/text-embedding-3-large`, `openai/text-embedding-3-small`, `openai/text-embedding-ada-002` The query will be used to search documents for relevant pieces semantically. To analyze document contents and enrich them when parsing Additional context to pass to the query analysis and qa steps Filter the documents returned from the knowledge base The text for semantic search. Note: you may add multiple upstream nodes to this field. Filter the content returned from the knowledge base Whether to create a hybrid knowledge base The method to break text into units before chunking. 'words': splits by word; 'sentences': splits by sentence boundary; 'paragraphs': splits by blank line/paragraph. Show intermediate steps The alpha value for the retrieval Extract separate questions from the query and retrieve content separately for each question to improve search performance Do a natural language metadata query Expand query to improve semantic search Expand query terms to improve semantic search The maximum number of relevant chunks to be returned Refine the initial ranking of returned chunks based on relevancy Refine the initial ranking of returned chunks based on relevancy `cohere/rerank-english-v3.0`, `cohere/rerank-multilingual-v3.0`, `cohere/rerank-v3.5`, `cohere/rerank-v4.0-fast`, `cohere/rerank-v4.0-pro`, `contextualai/ctxl-rerank-en-v1-instruct`, `jina/jina-reranker-v2-base-multilingual`, `llm/google/gemini-2.5-flash`, `llm/google/gemini-2.5-flash-lite-preview-06-17`, `llm/google/gemini-2.5-pro`, `opensource/BAAI/bge-reranker-v2-m3`, `together/Salesforce/Llama-Rank-V1`, `together/mixedbread-ai/Mxbai-Rerank-Large-V2`, `voyageai/rerank-2`, `voyageai/rerank-2-lite`, `voyageai/rerank-2.5`, `voyageai/rerank-2.5-lite` The unit of retrieval. Chunks will return the most relevant chunks, Documents will return document metadata with snippets, and Pages will return complete pages with all chunks from pages containing relevant content One of: `chunks`, `documents`, `pages` The score cutoff Transform the query for better semantic search The mode to use for the advanced search One of: `accurate`, `fast` The model to use for the QA `claude-3-5-sonnet-20240620`, `claude-3-haiku-20240307`, `claude-3-opus-20240229`, `gemini-2.0-flash-001`, `gemini-2.5-pro-exp-03-25`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-4o`, `gpt-4o-2024-08-06`, `gpt-4o-2024-11-20`, `gpt-4o-mini`, `gpt-4o-mini-2024-07-18`, `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `o1`, `o1-2024-12-17`, `o3`, `o3-mini`, `o3-pro`, `o4-mini`