Skip to main content
The gateway isn’t text-only. The same endpoint and API key reach models that see images and PDFs, generate images, speak and transcribe, and produce video — each with the routing, governance, and tracing you get on every call.

Modalities

Vision & PDFs

Send images and documents into a model as message content, with optional structured output.

Images

Generate and edit images with POST /v3/images. Sora, GPT Image, Imagen, and more.

Audio

Text to speech and speech to text with POST /v3/audio/speech and /v3/audio/transcriptions.

Video

Generate video from a prompt or reference image with POST /v3/videos.

Realtime voice

Two-way voice over WebSocket. OpenAI, xAI, and Gemini behind one protocol.

Models

The full catalog, with each model’s input and output modalities marked.

Discovering what a model can do

Each modality has a discovery endpoint that reports the models available and their capabilities, so you don’t have to hardcode a list:
EndpointReturns
GET /v3/images/modelsModels for POST /v3/images, with sizes, aspect ratios, and edit support.
GET /v3/audio/modelsSpeech (tts) and transcription (stt) models, with voices and formats.
GET /v3/videos/modelsModels for POST /v3/videos, with resolutions, aspect ratios, and max duration.
curl -s "https://api.opper.ai/v3/images/models" \
  -H "Authorization: Bearer $OPPER_API_KEY"

Input, output, and storage

Generation endpoints (/v3/images, /v3/audio/speech, /v3/videos) return the result inline as base64 by default and also persist a copy to Files so you get a reusable file_id and a presigned url. That stored output can be fed straight into a later call — an image into a video generation, an audio file_id into a transcription — without re-encoding. Persistence respects your retention rules and is skipped on zero-data-retention projects.