Knowledge nodes

Add these nodes with the pipeline builder: pipeline.add(name="...").<node>(...). Each entry lists the node’s configuration parameters. See the Pipeline reference for add, run, and lifecycle methods.

`chunking` — Chunking

Split text into chunks. Supports different chunking strategies like markdown-aware, sentence-based, or dynamic sizing.

pipeline.add(name="node").chunking()

Parameters

splitter_method

str

default:"'markdown'"

Strategy for grouping segmented text into final chunks. ‘sentence’: groups sentences; ‘markdown’: respects Markdown structure (headers, code); ‘dynamic’: optimizes breaks for size using chosen segmentation method. One of: dynamic, markdown, sentence

text

str

default:"''"

The text to chunk

chunk_overlap

int

default:"0"

The overlap of each chunk of text.

chunk_size

int

default:"512"

The size of each chunk of text.

segmentation_method

str

default:"'words'"

The method to break text into units before chunking. ‘words’: splits by word; ‘sentences’: splits by sentence boundary; ‘paragraphs’: splits by blank line/paragraph. One of: paragraphs, sentences, words

`create_workspace` — Create Workspace

Create a new workspace in a portal, upload files to its knowledge base, and share with users

Platform docs: Create Workspace

pipeline.add(name="node").create_workspace(name="...", portal=...)

Parameters

files

AcceptsFileList

default:"[]"

name

str

required

portal

AcceptsPortal

required

shared_emails

list[str]

default:"[]"

`knowledge_base` — Knowledge Base

Semantically query a knowledge base that can contain files, scraped URLs, and data from synced integrations (e.g., Google Drive).

Platform docs: Knowledge Base

pipeline.add(name="node").knowledge_base(query="...", knowledge_base=...)

Parameters

do_advanced_qa

bool

default:"False"

Use additional LLM calls to analyze each document to improve answer correctness

enable_filter

bool

default:"False"

Filter the content returned from the knowledge base. Agents should provide structured metadata filters directly in the filter input when useful.

enable_context

bool

default:"False"

Enable context

format_context_for_llm

bool

default:"True"

Format the context for the LLM

enable_document_db_filter

bool

default:"False"

Enable the document DB filter

set_response_format

bool

default:"False"

Generate an LLM response from the retrieved context

stream_response

bool

default:"False"

Whether to stream the LLM response

query

str

required

The query will be used to search documents for relevant content semantically. Must not be empty, only include relevant information for retrieval or metadata filter generation. Generally expand any specific acronyms or abbreviations but include the original acronym or abbreviation as well

context

str

default:"''"

Additional context to pass to the query analysis and qa steps

document_db_filter

str

default:"''"

Filter the documents returned from the knowledge base

filter

str

default:"''"

Structured metadata filter JSON for the knowledge base query. Use a top-level boolean clause such as {“type”:“condition”,“field”:“title”,“operator”:“match”,“value”:“Q4 report”}; leave empty when no hard metadata constraint is needed.

generate_metadata_filters

bool

default:"False"

Use an LLM to generate metadata filters to refine your query. Agents should usually leave this false and provide filters directly in the filter input.

knowledge_base

AcceptsKnowledgeBase

required

Select an existing knowledge base, Use $object.knowledge_base.? syntax

system_prompt

str

default:"''"

The system prompt to use for the LLM

retrieval

RetrievalConfig

rerank

RerankConfig

query_enhancement

QueryEnhancementConfig

alpha

float

The alpha value for the retrieval. 1.0 is pure vector search and 0.0 is pure lexical search

answer_multiple_questions

bool

Extract separate questions from the query and retrieve content separately for each question to improve search performance

do_nl_metadata_query

bool

Do a natural language metadata query

expand_query

bool

Expand query to improve semantic search

expand_query_terms

bool

Expand query terms to improve semantic search

num_chunks_to_rerank

int

The number of chunks to rerank

rerank_documents

bool

Rerank the documents returned from the knowledge base

rerank_model

str

Refine the initial ranking of returned chunks based on relevancy

Show Allowed values

cohere/rerank-english-v3.0, cohere/rerank-multilingual-v3.0, cohere/rerank-v3.5, cohere/rerank-v4.0-fast, cohere/rerank-v4.0-pro, contextualai/ctxl-rerank-en-v1-instruct, jina/jina-reranker-v2-base-multilingual, llm/google/gemini-2.5-flash, llm/google/gemini-2.5-flash-lite-preview-06-17, llm/google/gemini-2.5-pro, opensource/BAAI/bge-reranker-v2-m3, together/Salesforce/Llama-Rank-V1, together/mixedbread-ai/Mxbai-Rerank-Large-V2, voyageai/rerank-2, voyageai/rerank-2-lite, voyageai/rerank-2.5, voyageai/rerank-2.5-lite

retrieval_unit

str

The unit of retrieval. Chunks will return the most relevant chunks from the knowledge base as well as their text content. Documents will return the document metadata as well as most relevant snippets from the document. Pages will return complete pages with all chunks from pages containing relevant content One of: chunks, documents, pages

score_cutoff

float

The score cutoff

top_k

int

The number of relevant chunks to be returned

transform_query

bool

Transform the query for better semantic search

advanced_search_mode

str

default:"'accurate'"

The mode to use for the advanced search One of: accurate, fast

qa_model_name

str

default:"'gpt-4o-mini'"

The model to use for the QA

Show Allowed values

claude-3-5-sonnet-20240620, claude-3-haiku-20240307, claude-3-opus-20240229, gemini-2.0-flash-001, gemini-2.5-pro-exp-03-25, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-2024-08-06, gpt-4o-2024-11-20, gpt-4o-mini, gpt-4o-mini-2024-07-18, gpt-5, gpt-5-mini, gpt-5-nano, o1, o1-2024-12-17, o3, o3-mini, o3-pro, o4-mini

`knowledge_base_actions` — Knowledge Base Actions

Create, load, and sync Knowledge Bases

pipeline.add(name="node").knowledge_base_actions()

Parameters

sub_type

str

default:"''"

`knowledge_base_agent` — Knowledge Base Agent

Query a knowledge base using an agentic approach with tools.

Platform docs: Knowledge Base Agent

pipeline.add(name="node").knowledge_base_agent(query="...", knowledge_base=...)

Parameters

provider

str

default:"'google'"

Select the LLM provider to be used by the agent One of: google

mode

str

default:"'focused'"

Controls the query effort: ‘fast’ for quick answers, ‘focused’ for balanced depth, ‘deep’ for thorough analysis One of: deep, fast, focused

accept_additional_context

bool

default:"False"

If enabled, shows an additional context input to provide context to the agent

return_context

bool

default:"False"

If enabled, returns the relevant context/chunks used to generate the answer

model

str

default:"'gemini-3-flash-preview'"

Select the LLM model to be used by the agent One of: gemini-3-flash-preview

query

str

required

The natural language query. The agent will use this to determine the best way to query the knowledge base. Include the key criteria needed to answer the query.

context

str

default:"''"

Optional additional context to help the agent understand the query better (e.g., conversation history, user preferences)

knowledge_base

AcceptsKnowledgeBase

required

Select an existing knowledge base. You must provide the id in $.object.knowledge_base.id format

return_answer

bool

default:"True"

If enabled, generates a synthesized answer from the knowledge base

`knowledge_base_create` — Knowledge Base Create

Dynamically create a Knowledge Base with configured options

Platform docs: Knowledge Base Create

pipeline.add(name="node").knowledge_base_create(analyze_documents=True, chunk_overlap=0, chunk_size=0, collection_name="...")

Parameters

splitter_method

str

default:"'advanced'"

analyze_documents

bool

required

To analyze document contents and enrich them when parsing

apify_key

str

default:"''"

Apify API Key for scraping URLs (optional)

chunk_overlap

int

required

The overlap of the chunks to store in the knowledge base

chunk_size

int

required

The size of the chunks to store in the knowledge base

collection_name

str

required

The name of the collection to store the knowledge base in

embedding_model

str

required

The embedding model to use for the knowledge base. Format: provider/model

Show Allowed values

cohere/embed-english-light-v3.0, cohere/embed-english-v3.0, cohere/embed-multilingual-light-v3.0, cohere/embed-multilingual-v3.0, cohere/embed-v4.0, google/embedding-001, openai/text-embedding-3-large, openai/text-embedding-3-small, openai/text-embedding-ada-002, voyageai/voyage-3-large, voyageai/voyage-3-lite, voyageai/voyage-3.5, voyageai/voyage-3.5-lite, voyageai/voyage-4, voyageai/voyage-4-large, voyageai/voyage-4-lite, voyageai/voyage-code-3, voyageai/voyage-context-3, voyageai/voyage-finance-2, voyageai/voyage-multimodal-3.5

embedding_provider

str

required

The embedding provider to use

file_processing_implementation

str

required

The file processing implementation to use for parsing documents One of: contextual_ai, default, docling, llama_parse, mistral_ocr, reducto, textract

is_hybrid

bool

required

Whether to create a hybrid knowledge base

name

str

required

The name of the knowledge base to create

precision

str

required

The precision to use for the knowledge base

segmentation_method

str

default:"'words'"

sharded

bool

required

Whether to shard the knowledge base

vector_db_provider

str

required

The vector database provider to use

`knowledge_base_fetch_document_content` — Knowledge Base Fetch Document Content

Fetch the full content of a specific document from a knowledge base by scrolling through all its chunks

Platform docs: Knowledge Base Fetch Document Content

pipeline.add(name="node").knowledge_base_fetch_document_content(item_id="...", knowledge_base=...)

Parameters

item_id

str

required

knowledge_base

AcceptsKnowledgeBase

required

max_chunks

int

default:"200"

offset

int

default:"0"

page_ranges

str

default:"''"

`knowledge_base_fetch_items` — Knowledge Base Fetch Items

Advanced knowledge base item fetching with traversal, filtering, and output shaping capabilities

Platform docs: Knowledge Base Fetch Items

pipeline.add(name="node").knowledge_base_fetch_items(knowledge_base=...)

Parameters

filter

str

default:"''"

item_return_type

str

default:"'ALL'"

One of: ALL, DOCUMENTS, FOLDERS

knowledge_base

AcceptsKnowledgeBase

required

limit

int

default:"50"

max_depth

int

default:"0"

offset

int

default:"0"

root_ids

str

default:"''"

sort_fields

str

default:"''"

sort_orders

str

default:"''"

verbosity

str

default:"'metadata'"

One of: long, metadata, short

`knowledge_base_get_item_bboxes` — Knowledge Base Get Item Bboxes

Fetch OCR bounding boxes for specific pages of a PDF document in a knowledge base

pipeline.add(name="node").knowledge_base_get_item_bboxes(item_id="...", knowledge_base=..., page_ranges="...")

Parameters

item_id

str

required

knowledge_base

AcceptsKnowledgeBase

required

page_ranges

str

required

`knowledge_base_list_items` — Knowledge Base List Items

List items (documents and folders) from a knowledge base with pagination support

pipeline.add(name="node").knowledge_base_list_items(knowledge_base=...)

Parameters

knowledge_base

AcceptsKnowledgeBase

required

limit

int

default:"50"

list_all_items

bool

default:"False"

offset

int

default:"0"

`knowledge_base_loader` — Knowledge Base Loader

Load data into an existing knowledge base.

Platform docs: Knowledge Base Loader

pipeline.add(name="node").knowledge_base_loader(knowledge_base=..., url="...", documents=...)

Parameters

document_type

str

default:"'File'"

Select the type of data to load One of: File, URL

recursive

bool

default:"False"

Scrape sub-pages of the provided link

knowledge_base

AcceptsKnowledgeBase

required

The knowledge base to load data into

rescrape_frequency

str

default:"'Never'"

The frequency to rescrape the URL One of: Daily, Monthly, Never, Weekly

url

str

required

The raw URL link (e.g., https://vectorshift.ai/)

use_proxy

bool

default:"False"

Use a proxy to crawl the website

load_sitemap

bool

default:"False"

Load URLs to crawl from a sitemap. If the URL is a sitemap, it will be used directly. If the URL is not a sitemap, the sitemap will be fetched automatically.

max_depth

int

default:"5"

The maximum depth of the URL to crawl

max_recursive_urls

int

default:"10"

The maximum number of recursive URLs to scrape

same_domain_only

bool

default:"False"

Whether to only crawl links from the same domain

documents

AcceptsFileList

required

The file to be added to the selected knowledge base. Note: to convert text to file, use the Text to File node

`knowledge_base_sync` — Knowledge Base Sync

Automatically trigger a sync to the integrations in the selected knowledge base

Platform docs: Knowledge Base Sync

pipeline.add(name="node").knowledge_base_sync(knowledge_base=...)

Parameters

knowledge_base

AcceptsKnowledgeBase

required

`semantic_search` — Semantic Search

Generate a temporary vector database at run-time and retrieve the most relevant pieces from the documents based on the query.

Platform docs: Semantic Search

pipeline.add(name="node").semantic_search(query="...", documents="...")

Parameters

do_advanced_qa

bool

default:"False"

Use additional LLM calls to analyze each document to improve answer correctness

enable_filter

bool

default:"False"

Filter the content returned from the knowledge base

enable_context

bool

default:"False"

Additional context passed to advanced search and query analysis

format_context_for_llm

bool

default:"False"

Format the context for the LLM

enable_document_db_filter

bool

default:"False"

Filter the documents returned from the knowledge base

splitter_method

str

default:"'markdown'"

model

str

default:"'openai/text-embedding-3-small'"

The model to use for the embedding

Show Allowed values

cohere/embed-english-light-v2.0, cohere/embed-english-light-v3.0, cohere/embed-english-v2.0, cohere/embed-english-v3.0, cohere/embed-multilingual-light-v3.0, cohere/embed-multilingual-v2.0, cohere/embed-multilingual-v3.0, google/embedding-001, intfloat/multilingual-e5-large, openai/text-embedding-3-large, openai/text-embedding-3-small, openai/text-embedding-ada-002

query

str

required

The query will be used to search documents for relevant pieces semantically.

analyze_documents

bool

default:"False"

To analyze document contents and enrich them when parsing

context

str

default:"''"

Additional context to pass to the query analysis and qa steps

document_db_filter

str

default:"''"

Filter the documents returned from the knowledge base

documents

str

required

The text for semantic search. Note: you may add multiple upstream nodes to this field.

filter

str

default:"''"

Filter the content returned from the knowledge base

is_hybrid

bool

default:"False"

Whether to create a hybrid knowledge base

segmentation_method

str

default:"'words'"

The method to break text into units before chunking. ‘words’: splits by word; ‘sentences’: splits by sentence boundary; ‘paragraphs’: splits by blank line/paragraph.

show_intermediate_steps

bool

default:"False"

Show intermediate steps

retrieval

RetrievalConfig

rerank

RerankConfig

query_enhancement

QueryEnhancementConfig

alpha

float

The alpha value for the retrieval

answer_multiple_questions

bool

Extract separate questions from the query and retrieve content separately for each question to improve search performance

do_nl_metadata_query

bool

Do a natural language metadata query

expand_query

bool

Expand query to improve semantic search

expand_query_terms

bool

Expand query terms to improve semantic search

max_docs_per_query

int

The maximum number of relevant chunks to be returned

rerank_documents

bool

Refine the initial ranking of returned chunks based on relevancy

rerank_model

str

Refine the initial ranking of returned chunks based on relevancy

Show Allowed values

retrieval_unit

str

The unit of retrieval. Chunks will return the most relevant chunks, Documents will return document metadata with snippets, and Pages will return complete pages with all chunks from pages containing relevant content One of: chunks, documents, pages

score_cutoff

float

The score cutoff

transform_query

bool

Transform the query for better semantic search

advanced_search_mode

str

default:"'accurate'"

The mode to use for the advanced search One of: accurate, fast

qa_model_name

str

default:"'gpt-4o-mini'"

The model to use for the QA

Show Allowed values

Get started

Guides

Pipeline

Agent

Knowledge Base

Integrations

Table

Transformation

Session

Analytics

Workspace

`chunking` — Chunking

`create_workspace` — Create Workspace

`knowledge_base` — Knowledge Base

`knowledge_base_actions` — Knowledge Base Actions

`knowledge_base_agent` — Knowledge Base Agent

`knowledge_base_create` — Knowledge Base Create

`knowledge_base_fetch_document_content` — Knowledge Base Fetch Document Content

`knowledge_base_fetch_items` — Knowledge Base Fetch Items

`knowledge_base_get_item_bboxes` — Knowledge Base Get Item Bboxes

`knowledge_base_list_items` — Knowledge Base List Items

`knowledge_base_loader` — Knowledge Base Loader

`knowledge_base_sync` — Knowledge Base Sync

`semantic_search` — Semantic Search

​chunking — Chunking

​create_workspace — Create Workspace

​knowledge_base — Knowledge Base

​knowledge_base_actions — Knowledge Base Actions

​knowledge_base_agent — Knowledge Base Agent

​knowledge_base_create — Knowledge Base Create

​knowledge_base_fetch_document_content — Knowledge Base Fetch Document Content

​knowledge_base_fetch_items — Knowledge Base Fetch Items

​knowledge_base_get_item_bboxes — Knowledge Base Get Item Bboxes

​knowledge_base_list_items — Knowledge Base List Items

​knowledge_base_loader — Knowledge Base Loader

​knowledge_base_sync — Knowledge Base Sync

​semantic_search — Semantic Search

`chunking` — Chunking

`create_workspace` — Create Workspace

`knowledge_base` — Knowledge Base

`knowledge_base_actions` — Knowledge Base Actions

`knowledge_base_agent` — Knowledge Base Agent

`knowledge_base_create` — Knowledge Base Create

`knowledge_base_fetch_document_content` — Knowledge Base Fetch Document Content

`knowledge_base_fetch_items` — Knowledge Base Fetch Items

`knowledge_base_get_item_bboxes` — Knowledge Base Get Item Bboxes

`knowledge_base_list_items` — Knowledge Base List Items

`knowledge_base_loader` — Knowledge Base Loader

`knowledge_base_sync` — Knowledge Base Sync

`semantic_search` — Semantic Search