Skip to main content
Add these nodes with the pipeline builder: pipeline.add(name="...").<node>(...). Each entry lists the node’s configuration parameters. See the Pipeline reference for add, run, and lifecycle methods.

chunking

Split text into chunks. Supports different chunking strategies like markdown-aware, sentence-based, or dynamic sizing.
pipeline.add(name="node").chunking()
Parameters
splitter_method
str
default:"'markdown'"
Strategy for grouping segmented text into final chunks. ‘sentence’: groups sentences; ‘markdown’: respects Markdown structure (headers, code); ‘dynamic’: optimizes breaks for size using chosen segmentation method. One of: dynamic, markdown, sentence
text
str
default:"''"
The text to chunk
chunk_overlap
int
default:"0"
The overlap of each chunk of text.
chunk_size
int
default:"512"
The size of each chunk of text.
segmentation_method
str
default:"'words'"
The method to break text into units before chunking. ‘words’: splits by word; ‘sentences’: splits by sentence boundary; ‘paragraphs’: splits by blank line/paragraph. One of: paragraphs, sentences, words

create_workspace

Create a new workspace in a portal, upload files to its knowledge base, and share with users
pipeline.add(name="node").create_workspace(name="...", portal=...)
Parameters
files
AcceptsFileList
default:"[]"
name
str
required
portal
AcceptsPortal
required
shared_emails
list[str]
default:"[]"

knowledge_base

Semantically query a knowledge base that can contain files, scraped URLs, and data from synced integrations (e.g., Google Drive).
pipeline.add(name="node").knowledge_base(query="...", knowledge_base=...)
Parameters
do_advanced_qa
bool
default:"False"
Use additional LLM calls to analyze each document to improve answer correctness
enable_filter
bool
default:"False"
Filter the content returned from the knowledge base. Agents should provide structured metadata filters directly in the filter input when useful.
enable_context
bool
default:"False"
Enable context
format_context_for_llm
bool
default:"True"
Format the context for the LLM
enable_document_db_filter
bool
default:"False"
Enable the document DB filter
set_response_format
bool
default:"False"
Generate an LLM response from the retrieved context
stream_response
bool
default:"False"
Whether to stream the LLM response
query
str
required
The query will be used to search documents for relevant content semantically. Must not be empty, only include relevant information for retrieval or metadata filter generation. Generally expand any specific acronyms or abbreviations but include the original acronym or abbreviation as well
context
str
default:"''"
Additional context to pass to the query analysis and qa steps
document_db_filter
str
default:"''"
Filter the documents returned from the knowledge base
filter
str
default:"''"
Structured metadata filter JSON for the knowledge base query. Use a top-level boolean clause such as {“type”:“condition”,“field”:“title”,“operator”:“match”,“value”:“Q4 report”}; leave empty when no hard metadata constraint is needed.
generate_metadata_filters
bool
default:"False"
Use an LLM to generate metadata filters to refine your query. Agents should usually leave this false and provide filters directly in the filter input.
knowledge_base
AcceptsKnowledgeBase
required
Select an existing knowledge base, Use $object.knowledge_base.? syntax
system_prompt
str
default:"''"
The system prompt to use for the LLM
retrieval
RetrievalConfig
rerank
RerankConfig
query_enhancement
QueryEnhancementConfig
alpha
float
The alpha value for the retrieval. 1.0 is pure vector search and 0.0 is pure lexical search
answer_multiple_questions
bool
Extract separate questions from the query and retrieve content separately for each question to improve search performance
do_nl_metadata_query
bool
Do a natural language metadata query
expand_query
bool
Expand query to improve semantic search
expand_query_terms
bool
Expand query terms to improve semantic search
num_chunks_to_rerank
int
The number of chunks to rerank
rerank_documents
bool
Rerank the documents returned from the knowledge base
rerank_model
str
Refine the initial ranking of returned chunks based on relevancy
retrieval_unit
str
The unit of retrieval. Chunks will return the most relevant chunks from the knowledge base as well as their text content. Documents will return the document metadata as well as most relevant snippets from the document. Pages will return complete pages with all chunks from pages containing relevant content One of: chunks, documents, pages
score_cutoff
float
The score cutoff
top_k
int
The number of relevant chunks to be returned
transform_query
bool
Transform the query for better semantic search
advanced_search_mode
str
default:"'accurate'"
The mode to use for the advanced search One of: accurate, fast
qa_model_name
str
default:"'gpt-4o-mini'"
The model to use for the QA

knowledge_base_actions

Create, load, and sync Knowledge Bases
pipeline.add(name="node").knowledge_base_actions()
Parameters
sub_type
str
default:"''"

knowledge_base_agent

Query a knowledge base using an agentic approach with tools.
pipeline.add(name="node").knowledge_base_agent(query="...", knowledge_base=...)
Parameters
provider
str
default:"'google'"
Select the LLM provider to be used by the agent One of: google
mode
str
default:"'focused'"
Controls the query effort: ‘fast’ for quick answers, ‘focused’ for balanced depth, ‘deep’ for thorough analysis One of: deep, fast, focused
accept_additional_context
bool
default:"False"
If enabled, shows an additional context input to provide context to the agent
return_context
bool
default:"False"
If enabled, returns the relevant context/chunks used to generate the answer
model
str
default:"'gemini-3-flash-preview'"
Select the LLM model to be used by the agent One of: gemini-3-flash-preview
query
str
required
The natural language query. The agent will use this to determine the best way to query the knowledge base. Include the key criteria needed to answer the query.
context
str
default:"''"
Optional additional context to help the agent understand the query better (e.g., conversation history, user preferences)
knowledge_base
AcceptsKnowledgeBase
required
Select an existing knowledge base. You must provide the id in $.object.knowledge_base.id format
return_answer
bool
default:"True"
If enabled, generates a synthesized answer from the knowledge base

knowledge_base_create

Dynamically create a Knowledge Base with configured options
pipeline.add(name="node").knowledge_base_create(analyze_documents=True, chunk_overlap=0, chunk_size=0, collection_name="...")
Parameters
splitter_method
str
default:"'advanced'"
Strategy for grouping segmented text into final chunks. ‘sentence’: groups sentences; ‘markdown’: respects Markdown structure (headers, code); ‘dynamic’: optimizes breaks for size using chosen segmentation method. One of: advanced
analyze_documents
bool
required
To analyze document contents and enrich them when parsing
apify_key
str
default:"''"
Apify API Key for scraping URLs (optional)
chunk_overlap
int
required
The overlap of the chunks to store in the knowledge base
chunk_size
int
required
The size of the chunks to store in the knowledge base
collection_name
str
required
The name of the collection to store the knowledge base in
embedding_model
str
required
The embedding model to use for the knowledge base. Format: provider/model
embedding_provider
str
required
The embedding provider to use
file_processing_implementation
str
required
The file processing implementation to use for parsing documents One of: contextual_ai, default, docling, llama_parse, mistral_ocr, reducto, textract
is_hybrid
bool
required
Whether to create a hybrid knowledge base
name
str
required
The name of the knowledge base to create
precision
str
required
The precision to use for the knowledge base
segmentation_method
str
default:"'words'"
The method to break text into units before chunking. ‘words’: splits by word; ‘sentences’: splits by sentence boundary; ‘paragraphs’: splits by blank line/paragraph. One of: paragraphs, sentences, words
sharded
bool
required
Whether to shard the knowledge base
vector_db_provider
str
required
The vector database provider to use

knowledge_base_fetch_document_content

Fetch the full content of a specific document from a knowledge base by scrolling through all its chunks
pipeline.add(name="node").knowledge_base_fetch_document_content(item_id="...", knowledge_base=...)
Parameters
item_id
str
required
knowledge_base
AcceptsKnowledgeBase
required
max_chunks
int
default:"200"
offset
int
default:"0"
page_ranges
str
default:"''"

knowledge_base_fetch_items

Advanced knowledge base item fetching with traversal, filtering, and output shaping capabilities
pipeline.add(name="node").knowledge_base_fetch_items(knowledge_base=...)
Parameters
filter
str
default:"''"
item_return_type
str
default:"'ALL'"
One of: ALL, DOCUMENTS, FOLDERS
knowledge_base
AcceptsKnowledgeBase
required
limit
int
default:"50"
max_depth
int
default:"0"
offset
int
default:"0"
root_ids
str
default:"''"
sort_fields
str
default:"''"
sort_orders
str
default:"''"
verbosity
str
default:"'metadata'"
One of: long, metadata, short

knowledge_base_get_item_bboxes

Fetch OCR bounding boxes for specific pages of a PDF document in a knowledge base
pipeline.add(name="node").knowledge_base_get_item_bboxes(item_id="...", knowledge_base=..., page_ranges="...")
Parameters
item_id
str
required
knowledge_base
AcceptsKnowledgeBase
required
page_ranges
str
required

knowledge_base_list_items

List items (documents and folders) from a knowledge base with pagination support
pipeline.add(name="node").knowledge_base_list_items(knowledge_base=...)
Parameters
knowledge_base
AcceptsKnowledgeBase
required
limit
int
default:"50"
list_all_items
bool
default:"False"
offset
int
default:"0"

knowledge_base_loader

Load data into an existing knowledge base.
pipeline.add(name="node").knowledge_base_loader(knowledge_base=..., url="...", documents=...)
Parameters
document_type
str
default:"'File'"
Select the type of data to load One of: File, URL
recursive
bool
default:"False"
Scrape sub-pages of the provided link
knowledge_base
AcceptsKnowledgeBase
required
The knowledge base to load data into
rescrape_frequency
str
default:"'Never'"
The frequency to rescrape the URL One of: Daily, Monthly, Never, Weekly
url
str
required
The raw URL link (e.g., https://vectorshift.ai/)
use_proxy
bool
default:"False"
Use a proxy to crawl the website
load_sitemap
bool
default:"False"
Load URLs to crawl from a sitemap. If the URL is a sitemap, it will be used directly. If the URL is not a sitemap, the sitemap will be fetched automatically.
max_depth
int
default:"5"
The maximum depth of the URL to crawl
max_recursive_urls
int
default:"10"
The maximum number of recursive URLs to scrape
same_domain_only
bool
default:"False"
Whether to only crawl links from the same domain
documents
AcceptsFileList
required
The file to be added to the selected knowledge base. Note: to convert text to file, use the Text to File node

knowledge_base_sync

Automatically trigger a sync to the integrations in the selected knowledge base
pipeline.add(name="node").knowledge_base_sync(knowledge_base=...)
Parameters
knowledge_base
AcceptsKnowledgeBase
required
Generate a temporary vector database at run-time and retrieve the most relevant pieces from the documents based on the query.
pipeline.add(name="node").semantic_search(query="...", documents="...")
Parameters
do_advanced_qa
bool
default:"False"
Use additional LLM calls to analyze each document to improve answer correctness
enable_filter
bool
default:"False"
Filter the content returned from the knowledge base
enable_context
bool
default:"False"
Additional context passed to advanced search and query analysis
format_context_for_llm
bool
default:"False"
Format the context for the LLM
enable_document_db_filter
bool
default:"False"
Filter the documents returned from the knowledge base
splitter_method
str
default:"'markdown'"
Strategy for grouping segmented text into final chunks. ‘sentence’: groups sentences; ‘markdown’: respects Markdown structure (headers, code); ‘dynamic’: optimizes breaks for size using chosen segmentation method. One of: dynamic, markdown, sentence
model
str
default:"'openai/text-embedding-3-small'"
The model to use for the embedding
query
str
required
The query will be used to search documents for relevant pieces semantically.
analyze_documents
bool
default:"False"
To analyze document contents and enrich them when parsing
context
str
default:"''"
Additional context to pass to the query analysis and qa steps
document_db_filter
str
default:"''"
Filter the documents returned from the knowledge base
documents
str
required
The text for semantic search. Note: you may add multiple upstream nodes to this field.
filter
str
default:"''"
Filter the content returned from the knowledge base
is_hybrid
bool
default:"False"
Whether to create a hybrid knowledge base
segmentation_method
str
default:"'words'"
The method to break text into units before chunking. ‘words’: splits by word; ‘sentences’: splits by sentence boundary; ‘paragraphs’: splits by blank line/paragraph.
show_intermediate_steps
bool
default:"False"
Show intermediate steps
retrieval
RetrievalConfig
rerank
RerankConfig
query_enhancement
QueryEnhancementConfig
alpha
float
The alpha value for the retrieval
answer_multiple_questions
bool
Extract separate questions from the query and retrieve content separately for each question to improve search performance
do_nl_metadata_query
bool
Do a natural language metadata query
expand_query
bool
Expand query to improve semantic search
expand_query_terms
bool
Expand query terms to improve semantic search
max_docs_per_query
int
The maximum number of relevant chunks to be returned
rerank_documents
bool
Refine the initial ranking of returned chunks based on relevancy
rerank_model
str
Refine the initial ranking of returned chunks based on relevancy
retrieval_unit
str
The unit of retrieval. Chunks will return the most relevant chunks, Documents will return document metadata with snippets, and Pages will return complete pages with all chunks from pages containing relevant content One of: chunks, documents, pages
score_cutoff
float
The score cutoff
transform_query
bool
Transform the query for better semantic search
advanced_search_mode
str
default:"'accurate'"
The mode to use for the advanced search One of: accurate, fast
qa_model_name
str
default:"'gpt-4o-mini'"
The model to use for the QA