Vision & PDFs

Put an image or a PDF into a chat message and the model can read it. It’s the same messages array, with a richer content field — no separate endpoint. Vision and PDF are model capabilities, so you send the media to a regular chat model that supports them. Not every model does. Filter the catalog by capability — or call GET /v3/models?capability=vision (images) or ?capability=pdf (documents) — to find ones that accept each. The Claude, Gemini, and GPT families support both.

Images

Two ways to send an image: a hosted URL or inline base64.

import os, base64
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opper.ai/v3/compat",
    api_key=os.environ["OPPER_API_KEY"],
)

# Hosted URL
r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this picture?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}},
        ],
    }],
)
print(r.choices[0].message.content)

# Inline base64
with open("cat.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()
r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this picture?"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}},
        ],
    }],
)

import OpenAI from "openai";
import { readFileSync } from "node:fs";

const client = new OpenAI({
    baseURL: "https://api.opper.ai/v3/compat",
    apiKey: process.env.OPPER_API_KEY!,
});

// Hosted URL
let r = await client.chat.completions.create({
    model: "openai/gpt-5-mini",
    messages: [{
        role: "user",
        content: [
            { type: "text", text: "What's in this picture?" },
            { type: "image_url", image_url: { url: "https://example.com/cat.jpg" } },
        ],
    }],
});
console.log(r.choices[0].message.content);

// Inline base64
const b64 = readFileSync("cat.jpg").toString("base64");
r = await client.chat.completions.create({
    model: "openai/gpt-5-mini",
    messages: [{
        role: "user",
        content: [
            { type: "text", text: "What's in this picture?" },
            { type: "image_url", image_url: { url: `data:image/jpeg;base64,${b64}` } },
        ],
    }],
});

Use the URL form when the image is already on the web. Use base64 for local files or anything not publicly reachable.

PDFs

PDFs work the same way. The model reads both the text and any embedded images (charts, diagrams, scanned pages).

Python

import base64

with open("contract.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Summarize the key clauses in this contract."},
            {"type": "file", "file": {"file_data": f"data:application/pdf;base64,{b64}"}},
        ],
    }],
)
print(r.choices[0].message.content)

Free text or structured output

Need	Reach for
Show an image and ask a free-text question about it	A plain message (this page)
Extract structured fields from an image or PDF (a receipt, an invoice, a form)	Structured output with `response_format`
Run a multi-turn conversation about an uploaded document	A plain message (this page)
Batch process documents into a database	Structured output

Add response_format when you want typed JSON out of an image or PDF. Leave it off when the model just needs to talk about the media.

What’s next

Structured output

Multimodal input with typed JSON output.

Conversations

Multi-turn chat. Works with image and PDF messages too.

Models

Which models accept which input types.

Get started

Platform

Build

Control Plane

Tutorials

Tooling

Vision & PDFs

Images

PDFs

Free text or structured output

What’s next

Structured output

Conversations

Models

​Images

​PDFs

​Free text or structured output

​What’s next

Structured output

Conversations

Models

Images

PDFs

Free text or structured output

What’s next