Speech to Text
Transcribe a video/audio.
Last updated
Transcribe a video/audio.
Last updated
The Speech-to-Text node allows you to transcribe a video or audio file into raw text.
To use the Speech-to-Text node, pass in any audio or video output to the audio
handle and connect the text
handle to any node that accepts text.
Under the hood, Speech-to-Text uses OpenAI Whisper API. Whisper is an audio-to-text model that has been trained on a substantial amount of audio data. This allows it to recognize a many languages with high fidelity,
The output text is often connected to 1) LLMs for processing, 2) the "documents" edge in the Semantic Search node, or the 3) Output node.