Chunk Text Node

Split text into chunks (list of text). Supports different chunking strategies like markdown-aware, sentence-based, or dynamic sizing.

Node Inputs

In the gear:

Chunk Size: The size of each chunk of text in number of tokens. One token = 4 characters. The default value is 512 tokens. The value ranges from 1 to 4096.
- Type: Text
Chunk Overlap: The overlap of each chunk text in number of tokens. One token = 4 characters. The default value is 0. The value ranges from 0 to 4096.
- Type: Text
Chunk Strategy: Strategy for grouping segmented text into final chunks. sentence: groups sentences, markdown: respects markdown structure (headers, code), dynamic: optimizes breaks for size using chosen segmentation method (see below). The default option is Markdown.
- Type: Dropdown

If Dynamic is selected as the chunk strategy:

Segmentation Method: The method to break text into units before chunking. words: splits by word, sentences: splits by sentence boundary, paragraphs: splits by blank line/paragraph. The default option is words.
- Type: Dropdown

Chunks: The chunked text in a list
- Type: List<Text>
- Example usage: {{chunking_0.chunks}}

The below example shows a pipeline that takes a blog, chunks it into a list of text, and summarizes each chunk.

Text Node: Contains the text
- Text: The text from the blog
Chunk Text Node: Splits the text into chunks of text based on the chunk size and overlap
- Text: {{text_0.text}}
Summarizer Node: Summarizes each chunk in the list (list mode applies the operation, in this case, summarization, onto each item in the list)
- List Mode: True
- Text for summarization: {{chunking_0.chunks}}
Output: Display the list of summaries
- Output: {{summarizer_0.summary}}