Image to Text Node

The image to text node generates text based on an image. A common use case is to extract data from an image. For providing the image file you have two options:

If toggle is on Upload: Upload a file by clicking the upload button
If toggle is on Variable: Reference image files from other nodes

Node Inputs

System (Instructions): Tell the AI model how to utilize the data (e.g., extract all the text from the image) or behave.
- Type: Text
Prompt: The data that is sent to the LLM.
- Type: Text
Image: The image to convert to text
- Type: Image

Node Parameters

On the face of the node:

Provider: Provider of the AI model you want to use. The default provider is OpenAI.
Model: Specific model you want to use.
Use Personal Api Key: This allows you to enter your API key.

In the gear:

Max tokens: The maximum amount of input + output tokens the model will take in and generate per run (1 token = 4 characters). Note: different models have different token limits and the workflow will error if the max token is reached.
Temperature: The diversity of the LLM generation. To have more diverse or creative generations, increase the temperature. To have a more deterministic response, decrease the temperature.
Top P: The Top P parameter constrains how many tokens the LLM considers for generation at each step. For more diverse responses increase top p towards a maximum value of 1.0.
Stream Response: Check to have responses from the LLM stream. Ensure to change the Type on the output node to “Streamed Text”.
JSON Output: Check to to have the model return a structured JSON output rather than pure text.

Node Outputs

Text: The text generated from the LLM.
- Type: Text
- Example usage: {{ai_image_to_text_0.text}}
Tokens Used: The number of tokens used for the run
- Type: Integer
- Example usage: {{ai_image_to_text_0.tokens_used}}

Example

The below example shows a pipeline that takes an image of food menu and converts it to text.

Input Node: Contains the input image of food menu
Image to Text Node: Converts the image to text
- System: Extract all the text from the image
- Image: {{input_0.image}}
Output: The text generated from the image
- Output: {{ai_image_to_text_0.text}}

Pricing

Provider	Model	Input cost per 1000 tokens	Output cost per 1000 tokens
OpenAI	gpt-4.5-preview	0.075	0.15
OpenAI	gpt-4o	0.0025	0.01
OpenAI	gpt-4o-mini	0.00015	0.0006
OpenAI	chatgpt-4o-latest	0.005	0.015
OpenAI	gpt-4o-2024-08-06	0.0025	0.01
OpenAI	gpt-4-turbo-2024-04-09	0.01	0.03
Anthropic	claude-3-haiku-20240307	0.00025	0.00125
Anthropic	claude-3-opus-20240229	0.015	0.075
Anthropic	claude-3-sonnet-20240229	0.003	0.015
Anthropic	claude-3-5-sonnet-20240620	0.003	0.015
Anthropic	claude-3-5-sonnet-20241022	0.003	0.015
Anthropic	claude-3-7-sonnet-20250219	0.003	0.015
Google	gemini-1.5-flash	7.5e-05	0.0003
Google	gemini-1.5-flash-preview-0514	7.5e-05	4.6875e-06
Google	gemini-2.0-flash-exp	0	0
Google	gemini-2.0-flash-thinking-exp	0	0
Google	gemini-2.0-flash-lite-preview-02-05	7.5e-05	0.0003
Google	gemini-2.0-flash-001	0.00015	0.0006
XAI	grok-2-vision	0.002	0.01

Get Started

Platform

Account

Node Inputs

Node Parameters

Node Outputs

Example

Pricing

Get Started

Platform

Account

​Node Inputs

​Node Parameters

​Node Outputs

​Example

​Pricing

Node Inputs

Node Parameters

Node Outputs

Example

Pricing