Skip to main content
Put an image or a PDF into a chat message and the model can read it. It’s the same messages array, with a richer content field — no separate endpoint. Vision and PDF are model capabilities, so you send the media to a regular chat model that supports them. Not every model does. Filter the catalog by capability — or call GET /v3/models?capability=vision (images) or ?capability=pdf (documents) — to find ones that accept each. The Claude, Gemini, and GPT families support both.

Images

Two ways to send an image: a hosted URL or inline base64.
import os, base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opper.ai/v3/compat",
    api_key=os.environ["OPPER_API_KEY"],
)

# Hosted URL
r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this picture?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}},
        ],
    }],
)
print(r.choices[0].message.content)

# Inline base64
with open("cat.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()
r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this picture?"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}},
        ],
    }],
)
Use the URL form when the image is already on the web. Use base64 for local files or anything not publicly reachable.

PDFs

PDFs work the same way. The model reads both the text and any embedded images (charts, diagrams, scanned pages).
Python
import base64

with open("contract.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Summarize the key clauses in this contract."},
            {"type": "file", "file": {"file_data": f"data:application/pdf;base64,{b64}"}},
        ],
    }],
)
print(r.choices[0].message.content)

Free text or structured output

NeedReach for
Show an image and ask a free-text question about itA plain message (this page)
Extract structured fields from an image or PDF (a receipt, an invoice, a form)Structured output with response_format
Run a multi-turn conversation about an uploaded documentA plain message (this page)
Batch process documents into a databaseStructured output
Add response_format when you want typed JSON out of an image or PDF. Leave it off when the model just needs to talk about the media.

What’s next

Structured output

Multimodal input with typed JSON output.

Conversations

Multi-turn chat. Works with image and PDF messages too.

Models

Which models accept which input types.