Vision

The Vision feature lets you send images alongside your text messages to vision-capable AI models. The model can analyze, describe, and answer questions about the images you provide.

Supported Providers

Not all models support vision. The following providers and models can process images:

Provider	Vision Models
Anthropic	Claude Opus 4.6, Claude Sonnet 4.5, Claude Sonnet 4, Claude Opus 4, Claude Haiku 4.5
OpenAI	GPT-5.1, GPT-4o, GPT-4o mini, o1, o3, o4-mini
xAI	Grok 4.1 Fast
Google Gemini	Gemini 3 Flash, Gemini 3 Pro, Gemini 2.5 Pro, Gemini 2.5 Flash
OpenRouter	Any vision-capable model available through OpenRouter

info

The model registry indicates which models support vision via the vision capability tag. If a model does not support vision, the image data is still sent but the model may ignore it or return an error.

How to Send Images

There are three ways to attach an image to your message:

1. Paste from Clipboard (Ctrl+V / Cmd+V)

Copy an image from any source (screenshot tool, web browser, image editor) and paste it directly into the message input area. The Vision module listens for paste events on the input field and intercepts any clipboard items with an image/* MIME type. The image is read as a data URL via FileReader.readAsDataURL() and queued as a pending attachment.

2. Upload Button

Click the camera icon button (📷) next to the Send button. A hidden file input opens where you can select an image from your device. The file input accepts image/* types. After selection, the image is read as a data URL and queued.

3. Drag and Drop

Drag an image file from your file manager and drop it onto the message input area. The drop handler reads the file as a data URL using the same FileReader pipeline.

Image Preview

Once an image is attached, a thumbnail preview appears above the input area. The preview shows:

A 48px-tall thumbnail of the attached image with a rounded border
A close button (x) to remove the image before sending

You can type your text message alongside the image. The preview persists until you either send the message or click the close button to discard it.

tip

You can attach an image and send it with no text. Just paste or upload the image and hit Enter. The model will analyze the image and describe what it sees.

Sending the Message

When you click Send (or press Enter), both your text and the attached image are sent together as a single message. The image is encoded as a base64 data URL and included in the API request as a multipart content array.

After sending, the pending image is cleared automatically and the preview is hidden. The user message in the chat history stores the image data internally for the API call.

Image Format Support

The following image formats are supported (any format accepted by the browser's image/* MIME type):

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)

The Vision module validates that the pasted or uploaded data matches the pattern data:image/{type};base64,{data} before accepting it.

warning

Large images increase API costs because they consume more tokens. Most providers have image size limits. Images are sent as base64-encoded data, so a 1 MB image adds roughly 1.3 MB to the request payload. Consider resizing very large images before sending.

Provider-Specific Formatting

The platform automatically formats image data according to each provider's API requirements. The formatMessage() function handles this based on the providerFormat parameter:

Anthropic Format

Anthropic uses a native image content block with explicit base64 encoding:

[
  {
    "type": "image",
    "source": {
      "type": "base64",
      "media_type": "image/png",
      "data": "iVBORw0KGgo..."
    }
  },
  {
    "type": "text",
    "text": "What is shown in this image?"
  }
]

The MIME type is extracted from the data URL (e.g., image/png, image/jpeg) and passed as media_type. The base64 payload is extracted separately from the data URL prefix.

OpenAI, xAI, Gemini, and OpenRouter Format

All other providers use the image_url content block format with the full data URL:

[
  {
    "type": "image_url",
    "image_url": {
      "url": "data:image/png;base64,iVBORw0KGgo..."
    }
  },
  {
    "type": "text",
    "text": "What is shown in this image?"
  }
]

For Gemini, the platform's provider handler further converts this into Gemini's native inlineData format with mimeType and data fields.

You do not need to handle any of this -- it is automatic based on the selected provider.

One Image Per Message

You can send one image per message. To discuss multiple images, send them in separate messages. The model retains context from previous messages in the conversation, so you can say "compare this image to the one I sent earlier."

Enable/Disable Vision

Vision is enabled by default. You can toggle it in Settings > Capabilities. When disabled, the image upload button and paste handling are deactivated, and the camera icon is not rendered next to the Send button.

Use Cases

Screenshot analysis -- paste a screenshot and ask "What error is shown here?"
Document reading -- photograph a document and ask the model to extract text or summarize
Code review -- share a screenshot of code and ask for improvements
Design feedback -- upload a mockup and get design suggestions
Math problems -- photograph a math problem and ask for a solution
Data visualization -- share a chart and ask for interpretation
Diagrams -- upload architecture diagrams and ask the model to explain the flow

note

Vision works with any model that has the vision capability in the registry. If you are using a text-only model, attached images will not be processed meaningfully. Check the model registry to confirm your model supports vision.

Supported Providers​

How to Send Images​

1. Paste from Clipboard (Ctrl+V / Cmd+V)​

2. Upload Button​

3. Drag and Drop​

Image Preview​

Sending the Message​

Image Format Support​

Provider-Specific Formatting​

Anthropic Format​

OpenAI, xAI, Gemini, and OpenRouter Format​

One Image Per Message​

Enable/Disable Vision​

Use Cases​