Skip to main content
A knowledge base stores and organizes content — documents, files, URLs, and data from external integrations — so your workflows and agents can search and retrieve relevant information on demand. It uses semantic search to find content that matches the meaning of a query, not just exact keywords. When you query a knowledge base, it converts both the query and the stored content into numerical representations called embeddings, which capture meaning and context. It then finds the stored content whose meaning is closest to the query — so a question like “how do I reset my password?” can match a document titled “Account Recovery Steps” even if the words don’t overlap. This makes knowledge bases particularly effective for natural language queries and conversational AI use cases.

Key Use Cases

  • Answering user questions grounded in your own documents or internal content
  • Building RAG pipelines that pull relevant context before generating a response
  • Giving an agent access to product documentation, policies, or knowledge articles
  • Surfacing citations and sources alongside AI-generated responses
  • Indexing structured data like tables for retrieval alongside unstructured documents

Data Sources

File Upload Upload files directly to your knowledge base. Supported formats: doc, docx, pdf, csv, xls, xlsx, pptx, txt, md, and more. Integrations Sync content from connected external services so your knowledge base stays current without manual updates. Supported integrations include:
  • Google Drive
  • And more via the Add Integration option
Scrape URL Pull and index content from a web page by providing a URL. Workspace Files Use files already stored in your workspace without re-uploading. Index Table Index a structured table as a retrievable knowledge source alongside your unstructured content.

Using the Knowledge Base Reader Node

In workflows, the knowledge base is accessed through the Knowledge Base Reader node. Place it on the canvas, point it at a knowledge base, pass it a search query, and it returns the most relevant content for downstream nodes — typically an LLM — to consume.

Key Use Cases

  • Retrieving relevant document chunks to answer a user’s query in a Q&A pipeline
  • Feeding retrieved context into an LLM node for RAG
  • Searching across files and synced content at a specific step in a workflow
  • Using retrieved pages or documents when broader context is needed beyond individual chunks
  • Generating a direct answer from retrieved content using Advanced QA mode

How It Works

Step 1: Add the Knowledge Base Reader Node Find the Knowledge Base Reader node under the Knowledge category in the node panel and drag it onto the canvas.
Node palette showing the Knowledge category with Knowledge Base Reader node being dragged onto the canvas
Step 2: Name the Node The node defaults to knowledge_base_0. Rename it to something descriptive, especially when using multiple knowledge nodes in the same workflow.Step 3: Enter a Search Query In the Search Query field, enter the query to run against the knowledge base. Use {{ to reference a variable from an upstream node — typically an Input node carrying the user’s question.Step 4: Select or Create a Knowledge Base Use the Knowledge Base field to pick an existing knowledge base from the dropdown, or click + New Knowledge Base to create one inline. Switch to Variable mode to pass a knowledge base reference dynamically from upstream.
Knowledge Base Reader node settings panel showing Search Query, Knowledge Base selection, and configuration options
Step 5: Connect Outputs to Downstream Nodes Connect the relevant output — chunks, documents, pages, or formatted_text — to the node that will consume the retrieved content.
Knowledge Base Reader node connected to upstream and downstream nodes on the canvas

Inputs

  • Search Query * — The query used to search the knowledge base. Supports variable references via {{.
  • Knowledge Base * — The knowledge base to search. Select from a list, pass as a variable, or create inline.

Outputs

  • chunks — A list of semantically similar text segments retrieved from the knowledge base. Available when Retrieval Unit is set to Chunks. Best for fine-grained content retrieval.
  • documents — A list of full or partial documents retrieved from the knowledge base. Available when Retrieval Unit is set to Documents. Best when broader document-level context is needed.
  • pages — A list of page-level segments retrieved from the knowledge base. Available when Retrieval Unit is set to Pages. Best for page-aware retrieval over long documents.
  • citation_metadata — A list of source references for the retrieved results, used to surface citations in LLM responses.
  • formatted_text — Retrieved content pre-formatted as a single string ready for direct LLM input. Available when Format Context for LLM is enabled (on by default).
  • response — A generated answer produced from the retrieved content. Available when Do Advanced QA or Set Response Format is enabled. Becomes stream<string> when Stream Response is also enabled.

Node Options

Core Settings
  • Max Docs Per Query (Integer, default: 10) — Maximum number of results to retrieve per query.
  • Retrieval Unit (Dropdown, default: Chunks) — What unit of content is retrieved. Options: Chunks, Documents, Pages. Each produces a different output key.
  • Score Cutoff (Decimal, default: 0) — Minimum similarity score a result must meet to be returned. A value of 0 returns everything.
  • Format Context for LLM (Toggle, default: Yes) — Formats retrieved content for LLM input.
  • Alpha (Decimal, default: 0.6) — Balance between semantic and keyword search in hybrid retrieval. Higher values weight semantic search more heavily.
Advanced Settings
  • Enable Filter (Toggle, default: No) — Enables metadata-based filtering. When enabled, a filter input field appears.
  • Enable Context (Toggle, default: No) — Passes additional context with the query. When enabled, a context input field appears.
  • Rerank Documents (Toggle, default: No) — Reranks results for better relevance. When enabled:
    • Rerank Model — Model used for reranking (default: cohere/rerank-english-v3.0)
    • Num Chunks to Rerank (Integer, default: 10)
  • Transform Query (Toggle, default: No) — Rewrites the query before retrieval to improve results.
  • Answer Multiple Questions (Toggle, default: No) — Handles compound or multi-part queries in a single pass.
  • Expand Query (Toggle, default: No) — Expands the query to improve recall.
  • Expand Query Terms (Toggle, default: No) — Adds related terms to broaden search coverage.
  • Generate Metadata Filters (Toggle, default: No) — Derives metadata filters from the query automatically.
  • Do Advanced QA (beta) (Toggle, default: No) — Generates a direct answer from retrieved content. When enabled, citation_metadata is populated and a response output is added. Additional fields:
    • QA Model — Model used to generate the answer (default: gpt-4o-mini)
    • Advanced Search Modefast or accurate (default: accurate)
  • Set Response Format (Toggle, default: No) — Enables a custom response format with a System Prompt field.
  • Stream Response (Toggle, default: No) — Streams the response progressively. Only takes effect when Set Response Format is also enabled; changes response output to stream<string>.
  • Enable Document DB Filter (Toggle, default: No) — Applies document-level filters during retrieval. When enabled, a document_db_filter field appears.
  • Do NL Metadata Query (Toggle, default: No) — Enables natural language querying against document metadata.

Best Practices

  • Pass the user’s query dynamically from an Input node rather than hardcoding a search term
  • Use formatted_text for straightforward RAG pipelines; use chunks, documents, or pages when you need more control over content assembly
  • Set Score Cutoff above 0 to filter out low-relevance results
  • Use Enable List Mode (right-click the node) when processing multiple queries in a single run