The image to text node generates text based on an image. A common use case is to extract data from an image.

For providing the image file you have two options:

  1. If toggle is on Upload: Upload a file by clicking the upload button
  2. If toggle is on Variable: Reference image files from other nodes

Node Inputs

  1. System (Instructions): Tell the AI model how to utilize the data (e.g., extract all the text from the image) or behave.
    • Type: Text
  2. Prompt: The data that is sent to the LLM.
    • Type: Text
  3. Image: The image to convert to text
    • Type: Image

Node Parameters

On the face of the node:

  1. Provider: Provider of the AI model you want to use. The default provider is OpenAI.
  2. Model: Specific model you want to use.
  3. Use Personal Api Key: This allows you to enter your API key.

In the gear:

  1. Max tokens: The maximum amount of input + output tokens the model will take in and generate per run (1 token = 4 characters). Note: different models have different token limits and the workflow will error if the max token is reached.
  2. Temperature: The diversity of the LLM generation. To have more diverse or creative generations, increase the temperature. To have a more deterministic response, decrease the temperature.
  3. Top P: The Top P parameter constrains how many tokens the LLM considers for generation at each step. For more diverse responses increase top p towards a maximum value of 1.0.
  4. Stream Response: Check to have responses from the LLM stream. Ensure to change the Type on the output node to “Streamed Text”.
  5. JSON Output: Check to ​​to have the model return a structured JSON output rather than pure text.

Node Outputs

  1. Text: The text generated from the LLM.
    • Type: Text
    • Example usage: {{ai_image_to_text_0.text}}
  2. Tokens Used: The number of tokens used for the run
    • Type: Integer
    • Example usage: {{ai_image_to_text_0.tokens_used}}

Example

The below example shows a pipeline that takes an image of food menu and converts it to text.

  1. Input Node: Contains the input image of food menu
  2. Image to Text Node: Converts the image to text
    • System: Extract all the text from the image
    • Image: {{input_0.image}}
  3. Output: The text generated from the image
    • Output: {{ai_image_to_text_0.text}}

Pricing

ProviderModelInput cost per 1000 tokensOutput cost per 1000 tokens
OpenAIgpt-4.5-preview0.0750.15
OpenAIgpt-4o0.00250.01
OpenAIgpt-4o-mini0.000150.0006
OpenAIchatgpt-4o-latest0.0050.015
OpenAIgpt-4o-2024-08-060.00250.01
OpenAIgpt-4-turbo-2024-04-090.010.03
Anthropicclaude-3-haiku-202403070.000250.00125
Anthropicclaude-3-opus-202402290.0150.075
Anthropicclaude-3-sonnet-202402290.0030.015
Anthropicclaude-3-5-sonnet-202406200.0030.015
Anthropicclaude-3-5-sonnet-202410220.0030.015
Anthropicclaude-3-7-sonnet-202502190.0030.015
Googlegemini-1.5-flash7.5e-050.0003
Googlegemini-1.5-flash-preview-05147.5e-054.6875e-06
Googlegemini-2.0-flash-exp00
Googlegemini-2.0-flash-thinking-exp00
Googlegemini-2.0-flash-lite-preview-02-057.5e-050.0003
Googlegemini-2.0-flash-0010.000150.0006
XAIgrok-2-vision0.0020.01