Audio
Speech To Text Node
Generate text from an audio file
The speech to text node allows you to generate text from an audio file.
You have two options for providing an audio file.
- If toggle is on “Variable”, reference audio files from other nodes
- If toggle is on “Upload”, upload an audio file directly on the node
Node Inputs
- Audio: The audio for conversion
- Type:
Audio
- Type:
Node Parameters
- Provider: Provider of the AI model you want to use. The default provider is OpenAI.
- Model: Model name you want to use.
- Use Personal Api Key: This allows you to enter your API key.
Node Outputs
- Text: Audio as converted to text
- Type:
Text
- Example usage:
{{ai_speech_to_text_0.text}}
- Type:
Example
The below example shows a pipeline that takes audio input, converts it to text, processes it with an LLM, and converts the response back to audio.
- Input Node: Contains the input audio
- Speech to Text Node: Converts the audio to text
- Audio:
{{input_0.audio}}
- Audio:
- LLM Node: Processes the text / Answers the Question
- Input:
{{ai_speech_to_text_0.text}}
- Input:
- Text to Speech Node: Converts the LLM response to audio
- Text:
{{openai_0.response}}
- Text:
- Output: The final audio response
- Output:
{{ai_text_to_speech_0.audio}}
- Output:
Pricing
Provider | Model | Input cost per minute |
---|---|---|
OpenAI | whisper-1 | 0.006 |
Deepgram | nova-3 | 0.0043 |
Deepgram | nova-2 | 0.0043 |
Deepgram | nova | 0.0043 |
Deepgram | enhanced | 0.0145 |
Deepgram | base | 0.0123 |