> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vectorshift.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Image to Text Node

> Generate text from an image

<img src="https://mintcdn.com/vectorshift/fUdgBpX7VNpDIEaX/images/platform/pipelines/multi-modal/image-to-text.png?fit=max&auto=format&n=fUdgBpX7VNpDIEaX&q=85&s=79df71c151db07c88f7bb7c26e350a52" alt="Image to Text" width="644" height="477" data-path="images/platform/pipelines/multi-modal/image-to-text.png" />

The image to text node generates text based on an image. A common use case is to extract data from an image.

For providing the image file you have two options:

1. If toggle is on Upload: Upload a file by clicking the upload button
2. If toggle is on Variable: Reference image files from other nodes

## Node Inputs

1. System (Instructions): Tell the AI model how to utilize the data (e.g., extract all the text from the image) or behave.
   * Type: `Text`
2. Prompt: The data that is sent to the LLM.
   * Type: `Text`
3. Image: The image to convert to text
   * Type: `Image`

## Node Parameters

On the face of the node:

1. Provider: Provider of the AI model you want to use. The default provider is OpenAI.
2. Model: Specific model you want to use.
3. Use Personal Api Key: This allows you to enter your API key.

In the gear:

1. Max tokens: The maximum amount of input + output tokens the model will take in and generate per run (1 token = 4 characters). Note: different models have different token limits and the workflow will error if the max token is reached.
2. Temperature: The diversity of the LLM generation. To have more diverse or creative generations, increase the temperature. To have a more deterministic response, decrease the temperature.
3. Top P: The Top P parameter constrains how many tokens the LLM considers for generation at each step. For more diverse responses increase top p towards a maximum value of 1.0.
4. Stream Response: Check to have responses from the LLM stream. Ensure to change the Type on the output node to “Streamed Text”.
5. JSON Output: Check to ​​to have the model return a structured JSON output rather than pure text.

## Node Outputs

1. Text: The text generated from the LLM.
   * Type: `Text`
   * Example usage: `{{ai_image_to_text_0.text}}`
2. Tokens Used: The number of tokens used for the run
   * Type: `Integer`
   * Example usage: `{{ai_image_to_text_0.tokens_used}}`

## Example

The below example shows a pipeline that takes an image of food menu and converts it to text.

1. Input Node: Contains the input image of food menu
2. Image to Text Node: Converts the image to text
   * System: `Extract all the text from the image`
   * Image: `{{input_0.image}}`
3. Output: The text generated from the image
   * Output: `{{ai_image_to_text_0.text}}`

<img src="https://mintcdn.com/vectorshift/fUdgBpX7VNpDIEaX/images/platform/pipelines/multi-modal/image-to-text-example.png?fit=max&auto=format&n=fUdgBpX7VNpDIEaX&q=85&s=92d5b0abe8cf4a7f624a68070920a3f0" alt="Image to Text Example" width="1905" height="834" data-path="images/platform/pipelines/multi-modal/image-to-text-example.png" />

## Pricing

| Provider  | Model                               | Input cost per 1000 tokens | Output cost per 1000 tokens |
| :-------- | :---------------------------------- | -------------------------: | --------------------------: |
| OpenAI    | gpt-4.5-preview                     |                      0.075 |                        0.15 |
| OpenAI    | gpt-4o                              |                     0.0025 |                        0.01 |
| OpenAI    | gpt-4o-mini                         |                    0.00015 |                      0.0006 |
| OpenAI    | chatgpt-4o-latest                   |                      0.005 |                       0.015 |
| OpenAI    | gpt-4o-2024-08-06                   |                     0.0025 |                        0.01 |
| OpenAI    | gpt-4-turbo-2024-04-09              |                       0.01 |                        0.03 |
| Anthropic | claude-3-haiku-20240307             |                    0.00025 |                     0.00125 |
| Anthropic | claude-3-opus-20240229              |                      0.015 |                       0.075 |
| Anthropic | claude-3-sonnet-20240229            |                      0.003 |                       0.015 |
| Anthropic | claude-3-5-sonnet-20240620          |                      0.003 |                       0.015 |
| Anthropic | claude-3-5-sonnet-20241022          |                      0.003 |                       0.015 |
| Anthropic | claude-3-7-sonnet-20250219          |                      0.003 |                       0.015 |
| Google    | gemini-1.5-flash                    |                    7.5e-05 |                      0.0003 |
| Google    | gemini-1.5-flash-preview-0514       |                    7.5e-05 |                  4.6875e-06 |
| Google    | gemini-2.0-flash-exp                |                          0 |                           0 |
| Google    | gemini-2.0-flash-thinking-exp       |                          0 |                           0 |
| Google    | gemini-2.0-flash-lite-preview-02-05 |                    7.5e-05 |                      0.0003 |
| Google    | gemini-2.0-flash-001                |                    0.00015 |                      0.0006 |
| XAI       | grok-2-vision                       |                      0.002 |                        0.01 |
