The speech to text node allows you to generate text from an audio file.

You have two options for providing an audio file.

  1. If toggle is on “Variable”, reference audio files from other nodes
  2. If toggle is on “Upload”, upload an audio file directly on the node

Node Inputs

  1. Audio: The audio for conversion
    • Type: Audio

Node Parameters

  1. Provider: Provider of the AI model you want to use. The default provider is OpenAI.
  2. Model: Model name you want to use.
  3. Use Personal Api Key: This allows you to enter your API key.

Node Outputs

  1. Text: Audio as converted to text
    • Type: Text
    • Example usage: {{ai_speech_to_text_0.text}}

Example

The below example shows a pipeline that takes audio input, converts it to text, processes it with an LLM, and converts the response back to audio.

  1. Input Node: Contains the input audio
  2. Speech to Text Node: Converts the audio to text
    • Audio: {{input_0.audio}}
  3. LLM Node: Processes the text / Answers the Question
    • Input: {{ai_speech_to_text_0.text}}
  4. Text to Speech Node: Converts the LLM response to audio
    • Text: {{openai_0.response}}
  5. Output: The final audio response
    • Output: {{ai_text_to_speech_0.audio}}

Pricing

ProviderModelInput cost per minute
OpenAIwhisper-10.006
Deepgramnova-30.0043
Deepgramnova-20.0043
Deepgramnova0.0043
Deepgramenhanced0.0145
Deepgrambase0.0123