The text to speech node allows you to generate audio from text.

Node Inputs

  1. Text: The text to convert to audio
    • Type: Text

Node Parameters

  1. Provider: Provider of the text to speech model you want to use. The default provider is OpenAI.
  2. Model: Specific model you want to use.
  3. Voice: The voice for the generated audio.
  4. Use Personal Api Key: This allows you to enter your API key.

Node Outputs

  1. Audio: The text converted to audio
    • Type: Audio
    • Example usage: {{ai_text_to_speech_0.audio}}

Example

The below example shows a pipeline that takes audio input, converts it to text, processes it with an LLM, and converts the response back to audio.

  1. Input Node: Contains the input audio (recorded through the VectorShift platform)
  2. Speech to Text Node: Converts the audio to text
    • Audio: {{input_0.audio}}
  3. LLM Node: Processes the text / Answers the user question
    • Input: {{ai_speech_to_text_0.text}}
  4. Text to Speech Node: Converts the LLM response to audio
    • Text: {{openai_0.response}}
  5. Output: The final audio response
    • Output: {{ai_text_to_speech_0.audio}}

Pricing

ProviderModelInput cost per 1000 characters
OpenAItts-10.015
OpenAItts-1-hd0.03
ElevenLabseleven_monolingual_v10.11
ElevenLabseleven_multilingual_v10.11
ElevenLabseleven_multilingual_v20.11
ElevenLabseleven_turbo_v20.055
ElevenLabseleven_turbo_v2_50.055
ElevenLabseleven_flash_v2_50.055
ElevenLabseleven_flash_v20.055