Vision
The Vision feature lets you send images alongside your text messages to vision-capable AI models. The model can analyze, describe, and answer questions about the images you provide.
Supported Providers
Not all models support vision. The following providers and models can process images:
| Provider | Vision Models |
|---|---|
| Anthropic | Claude Opus 4.6, Claude Sonnet 4.5, Claude Sonnet 4, Claude Opus 4, Claude Haiku 4.5 |
| OpenAI | GPT-5.1, GPT-4o, GPT-4o mini, o1, o3, o4-mini |
| xAI | Grok 4.1 Fast |
| Google Gemini | Gemini 3 Flash, Gemini 3 Pro, Gemini 2.5 Pro, Gemini 2.5 Flash |
| OpenRouter | Any vision-capable model available through OpenRouter |
The model registry indicates which models support vision via the vision capability tag. If a model does not support vision, the image data is still sent but the model may ignore it or return an error.
How to Send Images
There are three ways to attach an image to your message:
1. Paste from Clipboard (Ctrl+V / Cmd+V)
Copy an image from any source (screenshot tool, web browser, image editor) and paste it directly into the message input area. The Vision module listens for paste events on the input field and intercepts any clipboard items with an image/* MIME type. The image is read as a data URL via FileReader.readAsDataURL() and queued as a pending attachment.
2. Upload Button
Click the camera icon button (📷) next to the Send button. A hidden file input opens where you can select an image from your device. The file input accepts image/* types. After selection, the image is read as a data URL and queued.
3. Drag and Drop
Drag an image file from your file manager and drop it onto the message input area. The drop handler reads the file as a data URL using the same FileReader pipeline.
Image Preview
Once an image is attached, a thumbnail preview appears above the input area. The preview shows:
- A 48px-tall thumbnail of the attached image with a rounded border
- A close button (x) to remove the image before sending
You can type your text message alongside the image. The preview persists until you either send the message or click the close button to discard it.
You can attach an image and send it with no text. Just paste or upload the image and hit Enter. The model will analyze the image and describe what it sees.
Sending the Message
When you click Send (or press Enter), both your text and the attached image are sent together as a single message. The image is encoded as a base64 data URL and included in the API request as a multipart content array.
After sending, the pending image is cleared automatically and the preview is hidden. The user message in the chat history stores the image data internally for the API call.
Image Format Support
The following image formats are supported (any format accepted by the browser's image/* MIME type):
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- WebP (.webp)
The Vision module validates that the pasted or uploaded data matches the pattern data:image/{type};base64,{data} before accepting it.
Large images increase API costs because they consume more tokens. Most providers have image size limits. Images are sent as base64-encoded data, so a 1 MB image adds roughly 1.3 MB to the request payload. Consider resizing very large images before sending.
Provider-Specific Formatting
The platform automatically formats image data according to each provider's API requirements. The formatMessage() function handles this based on the providerFormat parameter:
Anthropic Format
Anthropic uses a native image content block with explicit base64 encoding:
[
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": "iVBORw0KGgo..."
}
},
{
"type": "text",
"text": "What is shown in this image?"
}
]
The MIME type is extracted from the data URL (e.g., image/png, image/jpeg) and passed as media_type. The base64 payload is extracted separately from the data URL prefix.
OpenAI, xAI, Gemini, and OpenRouter Format
All other providers use the image_url content block format with the full data URL:
[
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgo..."
}
},
{
"type": "text",
"text": "What is shown in this image?"
}
]
For Gemini, the platform's provider handler further converts this into Gemini's native inlineData format with mimeType and data fields.
You do not need to handle any of this -- it is automatic based on the selected provider.
One Image Per Message
You can send one image per message. To discuss multiple images, send them in separate messages. The model retains context from previous messages in the conversation, so you can say "compare this image to the one I sent earlier."
Enable/Disable Vision
Vision is enabled by default. You can toggle it in Settings > Capabilities. When disabled, the image upload button and paste handling are deactivated, and the camera icon is not rendered next to the Send button.
Use Cases
- Screenshot analysis -- paste a screenshot and ask "What error is shown here?"
- Document reading -- photograph a document and ask the model to extract text or summarize
- Code review -- share a screenshot of code and ask for improvements
- Design feedback -- upload a mockup and get design suggestions
- Math problems -- photograph a math problem and ask for a solution
- Data visualization -- share a chart and ask for interpretation
- Diagrams -- upload architecture diagrams and ask the model to explain the flow
Vision works with any model that has the vision capability in the registry. If you are using a text-only model, attached images will not be processed meaningfully. Check the model registry to confirm your model supports vision.