Speech to Text

Transcribe a video/audio.

The Speech-to-Text node allows you to transcribe a video or audio file into raw text.

To use the Speech-to-Text node, pass in any audio or video output to the audio handle and connect the text handle to any node that accepts text.

Under the hood, Speech-to-Text uses OpenAI Whisper API. Whisper is an audio-to-text model that has been trained on a substantial amount of audio data. This allows it to recognize a many languages with high fidelity,

The output text is often connected to 1) LLMs for processing, 2) the "documents" edge in the Semantic Search node, or the 3) Output node.

Last updated