Skip to main content
The text to speech node allows you to generate audio from text.
- Text: The text to convert to audio
 
Node Parameters
- Provider: Provider of the text to speech model you want to use. The default provider is OpenAI.
 
- Model: Specific model you want to use.
 
- Voice: The voice for the generated audio.
 
- Use Personal Api Key: This allows you to enter your API key.
 
Node Outputs
- Audio: The text converted to audio
- Type: 
Audio 
- Example usage: 
{{ai_text_to_speech_0.audio}} 
 
Example
The below example shows a pipeline that takes audio input, converts it to text, processes it with an LLM, and converts the response back to audio.
- Input Node: Contains the input audio (recorded through the VectorShift platform)
 
- Speech to Text Node: Converts the audio to text
 
- LLM Node: Processes the text / Answers the user question
- Input: 
{{ai_speech_to_text_0.text}} 
 
- Text to Speech Node: Converts the LLM response to audio
- Text: 
{{openai_0.response}} 
 
- Output: The final audio response
- Output: 
{{ai_text_to_speech_0.audio}} 
 
Pricing
| Provider | Model | Input cost per 1000 characters | 
| OpenAI | tts-1 | 0.015 | 
| OpenAI | tts-1-hd | 0.03 | 
| ElevenLabs | eleven_monolingual_v1 | 0.11 | 
| ElevenLabs | eleven_multilingual_v1 | 0.11 | 
| ElevenLabs | eleven_multilingual_v2 | 0.11 | 
| ElevenLabs | eleven_turbo_v2 | 0.055 | 
| ElevenLabs | eleven_turbo_v2_5 | 0.055 | 
| ElevenLabs | eleven_flash_v2_5 | 0.055 | 
| ElevenLabs | eleven_flash_v2 | 0.055 |