Vision & images

Vision and images

How to send images to models across the OpenAI- and Anthropic-compatible endpoints.

  • OpenAI Chat Completions: POST /api/openai/v1/chat/completions
  • OpenAI Responses: POST /api/openai/v1/responses
  • Anthropic Messages: POST /api/anthropic/v1/messages

OpenAI Chat Completions

Images are provided via content parts with image_url objects. You may include detail: "auto" | "high" | "low".

{
  "model": "gpt-4o-2024-11-20",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe the image" },
        { "type": "image_url", "image_url": { "url": "https://example.com/cat.png", "detail": "high" } }
      ]
    }
  ]
}

You can also embed base64 images:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "What's in this picture?" },
    { "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0K..." } }
  ]
}

OpenAI Responses

Images can be sent in input as content parts—use input_image parts with a URL or base64.

{
  "model": "gpt-4o-2024-11-20",
  "input": [
    { "type": "input_text", "text": "Transcribe the sign" },
    { "type": "input_image", "image_url": { "url": "https://example.com/sign.jpg", "detail": "auto" } }
  ]
}

Base64 example:

{
  "input": [
    { "type": "input_text", "text": "What is shown?" },
    { "type": "input_image", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ..." } }
  ]
}

Anthropic Messages

Images are sent as image blocks. Use a source with type: "base64" and media_type.

{
  "model": "claude-sonnet-4-5-20250929",
  "max_tokens": 256,
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Describe the image" },
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/png",
            "data": "iVBORw0KGgoAAAANSUhEUg..."
          }
        }
      ]
    }
  ]
}

Tips and limits

  • Prefer detail: "low" for thumbnails or when latency is critical.
  • Large images increase input tokens; resize when possible.
  • Follow provider size limits (e.g., 20MB typical upper bound; practical sizes are much smaller).
  • Use HTTPS image URLs accessible from the server if providing links.
  • For multi-image prompts, list multiple image_url/image parts in order.