Speech to Text

Transcribe a video/audio.

The Speech-to-Text node allows you to transcribe a video or audio file into raw text.

To use the Speech-to-Text node, pass in any audio or video output to the audio handle and connect the text handle to any node that accepts text.

Under the hood, Speech-to-Text uses OpenAI Whisper API. Whisper is an audio-to-text model that has been trained on a substantial amount of audio data. This allows it to recognize a many languages with high fidelity,

The output text is often connected to 1) LLMs for processing, 2) the "documents" edge in the Semantic Search node, or the 3) Output node.

