Chat completions

The simplest Chat API call: a list of messages goes in, a single response comes out. Same shape every model supports. If you haven’t set up your SDK yet, start with Drop-in SDKs.

A minimal call

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opper.ai/v3/compat",
    api_key=os.environ["OPPER_API_KEY"],
)

r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{"role": "user", "content": "What's a vector?"}],
)
print(r.choices[0].message.content)

Pick a model

The model field is always provider-prefixed: openai/gpt-5.5, anthropic/claude-sonnet-4-6, gemini/gemini-2.5-pro. Browse the full set on the Models page. You can call any model from any SDK. The Anthropic SDK can talk to a Google model, the OpenAI SDK can talk to a Claude model. The provider prefix decides where the call routes. The SDK only sets the request shape. If you don’t pass model, the call falls through to your Route rule (if any), then to model preference hints.

Common parameters

Parameter	What it does
`temperature`	0 to 2. Lower is more deterministic.
`top_p`	0 to 1. Nucleus sampling. Don’t use with `temperature` at the same time.
`max_tokens`	Cap the response length.
`stop`	A string or array of strings. The model stops as soon as it sees one.
`frequency_penalty` / `presence_penalty`	-2 to 2. Penalize repeated tokens.
`n`	How many response choices to generate. Default 1.

Reasoning models (the GPT-5 family, Claude with extended thinking) also accept reasoning_effort: "low" | "medium" | "high" to control how much the model “thinks” before answering.

The response shape

Response

{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "model": "openai/gpt-5-mini",
  "created": 1716124800,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A vector is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 42,
    "total_tokens": 53
  }
}

The two things you’ll read most:

choices[0].message.content is the assistant’s reply
choices[0].finish_reason is why the model stopped (stop, length, tool_calls, content_filter)

What’s next

Conversations

Multi-turn chat with message history.

Tool calling

Let the model call your code.

Streaming

Stream tokens as they’re generated.

Structured output

Get JSON back instead of free text.

Get started

Platform

Build

Control Plane

Tutorials

Tooling

Chat completions

A minimal call

Pick a model

Common parameters

The response shape

What’s next

Conversations

Tool calling

Streaming

Structured output

Get started

Platform

Build

Control Plane

Tutorials

Tooling

Documentation Index

​A minimal call

​Pick a model

​Common parameters

​The response shape

​What’s next

Conversations

Tool calling

Streaming

Structured output

A minimal call

Pick a model

Common parameters

The response shape

What’s next