Image
Image to Text Node
Generate text from an image
The image to text node generates text based on an image. A common use case is to extract data from an image.
For providing the image file you have two options:
- If toggle is on Upload: Upload a file by clicking the upload button
- If toggle is on Variable: Reference image files from other nodes
Node Inputs
- System (Instructions): Tell the AI model how to utilize the data (e.g., extract all the text from the image) or behave.
- Type:
Text
- Type:
- Prompt: The data that is sent to the LLM.
- Type:
Text
- Type:
- Image: The image to convert to text
- Type:
Image
- Type:
Node Parameters
On the face of the node:
- Provider: Provider of the AI model you want to use. The default provider is OpenAI.
- Model: Specific model you want to use.
- Use Personal Api Key: This allows you to enter your API key.
In the gear:
- Max tokens: The maximum amount of input + output tokens the model will take in and generate per run (1 token = 4 characters). Note: different models have different token limits and the workflow will error if the max token is reached.
- Temperature: The diversity of the LLM generation. To have more diverse or creative generations, increase the temperature. To have a more deterministic response, decrease the temperature.
- Top P: The Top P parameter constrains how many tokens the LLM considers for generation at each step. For more diverse responses increase top p towards a maximum value of 1.0.
- Stream Response: Check to have responses from the LLM stream. Ensure to change the Type on the output node to “Streamed Text”.
- JSON Output: Check to to have the model return a structured JSON output rather than pure text.
Node Outputs
- Text: The text generated from the LLM.
- Type:
Text
- Example usage:
{{ai_image_to_text_0.text}}
- Type:
- Tokens Used: The number of tokens used for the run
- Type:
Integer
- Example usage:
{{ai_image_to_text_0.tokens_used}}
- Type:
Example
The below example shows a pipeline that takes an image of food menu and converts it to text.
- Input Node: Contains the input image of food menu
- Image to Text Node: Converts the image to text
- System:
Extract all the text from the image
- Image:
{{input_0.image}}
- System:
- Output: The text generated from the image
- Output:
{{ai_image_to_text_0.text}}
- Output:
Pricing
Provider | Model | Input cost per 1000 tokens | Output cost per 1000 tokens |
---|---|---|---|
OpenAI | gpt-4.5-preview | 0.075 | 0.15 |
OpenAI | gpt-4o | 0.0025 | 0.01 |
OpenAI | gpt-4o-mini | 0.00015 | 0.0006 |
OpenAI | chatgpt-4o-latest | 0.005 | 0.015 |
OpenAI | gpt-4o-2024-08-06 | 0.0025 | 0.01 |
OpenAI | gpt-4-turbo-2024-04-09 | 0.01 | 0.03 |
Anthropic | claude-3-haiku-20240307 | 0.00025 | 0.00125 |
Anthropic | claude-3-opus-20240229 | 0.015 | 0.075 |
Anthropic | claude-3-sonnet-20240229 | 0.003 | 0.015 |
Anthropic | claude-3-5-sonnet-20240620 | 0.003 | 0.015 |
Anthropic | claude-3-5-sonnet-20241022 | 0.003 | 0.015 |
Anthropic | claude-3-7-sonnet-20250219 | 0.003 | 0.015 |
gemini-1.5-flash | 7.5e-05 | 0.0003 | |
gemini-1.5-flash-preview-0514 | 7.5e-05 | 4.6875e-06 | |
gemini-2.0-flash-exp | 0 | 0 | |
gemini-2.0-flash-thinking-exp | 0 | 0 | |
gemini-2.0-flash-lite-preview-02-05 | 7.5e-05 | 0.0003 | |
gemini-2.0-flash-001 | 0.00015 | 0.0006 | |
XAI | grok-2-vision | 0.002 | 0.01 |