Vision & PDFs

Put an image or a PDF into a chat message and the model can read it. It’s the same messages array, with a richer content field. Not every model supports every input type. See the models catalog for which models accept images and which accept PDFs.

Images

Two ways to send an image: a hosted URL or inline base64.

import os, base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opper.ai/v3/compat",
    api_key=os.environ["OPPER_API_KEY"],
)

# Hosted URL
r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this picture?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}},
        ],
    }],
)
print(r.choices[0].message.content)

# Inline base64
with open("cat.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()
r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this picture?"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}},
        ],
    }],
)

Use the URL form when the image is already on the web. Use base64 for local files or anything not publicly reachable.

PDFs

PDFs work the same way. The model reads both the text and any embedded images (charts, diagrams, scanned pages).

Python

import base64

with open("contract.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Summarize the key clauses in this contract."},
            {"type": "file", "file": {"file_data": f"data:application/pdf;base64,{b64}"}},
        ],
    }],
)
print(r.choices[0].message.content)

Chat vs JSON for media

Need	Reach for
Show an image and ask a free-text question about it	Chat API (this page)
Extract structured fields from an image or PDF (a receipt, an invoice, a form)	JSON API with `output_schema`
Run a multi-turn conversation about an uploaded document	Chat API (this page)
Batch process documents into a database	JSON API

Use the JSON API when you want structured JSON out of an image. Use Chat when the model needs to converse about the media.

What’s next

JSON API: schemas

Multimodal input with typed JSON output.

Conversations

Multi-turn chat. Works with image and PDF messages too.

Models

Which models accept which input types.

​Images

​PDFs

​Chat vs JSON for media

​What’s next