Skip to main content
Split text into chunks (list of text). Supports different chunking strategies like markdown-aware, sentence-based, or dynamic sizing.
- Text: The text for chunking
Node Parameters
In the gear:
- Chunk Size: The size of each chunk of text in number of tokens. One token = 4 characters. The default value is
512 tokens. The value ranges from 1 to 4096.
- Chunk Overlap: The overlap of each chunk text in number of tokens. One token = 4 characters. The default value is
0. The value ranges from 0 to 4096.
- Chunk Strategy: Strategy for grouping segmented text into final chunks.
sentence: groups sentences, markdown: respects markdown structure (headers, code), dynamic: optimizes breaks for size using chosen segmentation method (see below). The default option is Markdown.
If Dynamic is selected as the chunk strategy:
- Segmentation Method: The method to break text into units before chunking.
words: splits by word, sentences: splits by sentence boundary, paragraphs: splits by blank line/paragraph. The default option is words.
Node Outputs
- Chunks: The chunked text in a list
- Type:
List<Text>
- Example usage:
{{chunking_0.chunks}}
Example
The below example shows a pipeline that takes a blog, chunks it into a list of text, and summarizes each chunk.
- Text Node: Contains the text
- Text:
The text from the blog
- Chunk Text Node: Splits the text into chunks of text based on the chunk size and overlap
- Summarizer Node: Summarizes each chunk in the list (list mode applies the operation, in this case, summarization, onto each item in the list)
- List Mode:
True
- Text for summarization:
{{chunking_0.chunks}}
- Output: Display the list of summaries
- Output:
{{summarizer_0.summary}}
