The image to text node generates text based on an image. A common use case is to extract data from an image.

For providing the image file you have two options:

  1. If toggle is on Upload: Upload a file by clicking the upload button
  2. If toggle is on Variable: Reference image files from other nodes

Node Inputs

  1. System (Instructions): Tell the AI model how to utilize the data (e.g., extract all the text from the image) or behave.
    • Type: Text
  2. Prompt: The data that is sent to the LLM.
    • Type: Text
  3. Image: The image to convert to text
    • Type: Image

Node Parameters

On the face of the node:

  1. Provider: Provider of the AI model you want to use. The default provider is OpenAI.
  2. Model: Specific model you want to use.
  3. Use Personal Api Key: This allows you to enter your API key.

In the gear:

  1. Max tokens: The maximum amount of input + output tokens the model will take in and generate per run (1 token = 4 characters). Note: different models have different token limits and the workflow will error if the max token is reached.
  2. Temperature: The diversity of the LLM generation. To have more diverse or creative generations, increase the temperature. To have a more deterministic response, decrease the temperature.
  3. Top P: The Top P parameter constrains how many tokens the LLM considers for generation at each step. For more diverse responses increase top p towards a maximum value of 1.0.
  4. Stream Response: Check to have responses from the LLM stream. Ensure to change the Type on the output node to “Streamed Text”.
  5. JSON Output: Check to ​​to have the model return a structured JSON output rather than pure text.

Node Outputs

  1. Text: The text generated from the LLM.
    • Type: Text
    • Example usage: {{ai_image_to_text_0.text}}
  2. Tokens Used: The number of tokens used for the run
    • Type: Integer
    • Example usage: {{ai_image_to_text_0.tokens_used}}