Skip to main content
The speech to text node allows you to generate text from an audio file.
You have two options for providing an audio file.
- If toggle is on “Variable”, reference audio files from other nodes
- If toggle is on “Upload”, upload an audio file directly on the node
- Audio: The audio for conversion
Node Parameters
- Provider: Provider of the AI model you want to use. The default provider is OpenAI.
- Model: Model name you want to use.
- Use Personal Api Key: This allows you to enter your API key.
Node Outputs
- Text: Audio as converted to text
- Type:
Text
- Example usage:
{{ai_speech_to_text_0.text}}
Example
The below example shows a pipeline that takes audio input, converts it to text, processes it with an LLM, and converts the response back to audio.
- Input Node: Contains the input audio
- Speech to Text Node: Converts the audio to text
- LLM Node: Processes the text / Answers the Question
- Input:
{{ai_speech_to_text_0.text}}
- Text to Speech Node: Converts the LLM response to audio
- Text:
{{openai_0.response}}
- Output: The final audio response
- Output:
{{ai_text_to_speech_0.audio}}
Pricing
| Provider | Model | Input cost per minute |
| OpenAI | whisper-1 | 0.006 |
| Deepgram | nova-3 | 0.0043 |
| Deepgram | nova-2 | 0.0043 |
| Deepgram | nova | 0.0043 |
| Deepgram | enhanced | 0.0145 |
| Deepgram | base | 0.0123 |