Speech To Text Node - VectorShift

On this page

Node Inputs
Node Parameters
Node Outputs
Example
Pricing

Speech To Text

The speech to text node allows you to generate text from an audio file. You have two options for providing an audio file.

If toggle is on “Variable”, reference audio files from other nodes
If toggle is on “Upload”, upload an audio file directly on the node

Node Inputs

Audio: The audio for conversion
- Type: Audio

Node Parameters

Provider: Provider of the AI model you want to use. The default provider is OpenAI.
Model: Model name you want to use.
Use Personal Api Key: This allows you to enter your API key.

Node Outputs

Text: Audio as converted to text
- Type: Text
- Example usage: {{ai_speech_to_text_0.text}}

Example

The below example shows a pipeline that takes audio input, converts it to text, processes it with an LLM, and converts the response back to audio.

Input Node: Contains the input audio
Speech to Text Node: Converts the audio to text
- Audio: {{input_0.audio}}
LLM Node: Processes the text / Answers the Question
- Input: {{ai_speech_to_text_0.text}}
Text to Speech Node: Converts the LLM response to audio
- Text: {{openai_0.response}}
Output: The final audio response
- Output: {{ai_text_to_speech_0.audio}}

Speech to Text Example

Pricing

Provider	Model	Input cost per minute
OpenAI	whisper-1	0.006
Deepgram	nova-3	0.0043
Deepgram	nova-2	0.0043
Deepgram	nova	0.0043
Deepgram	enhanced	0.0145
Deepgram	base	0.0123

Text to Speech Node Text to Image Node