Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.opper.ai/llms.txt

Use this file to discover all available pages before exploring further.

The simplest Chat API call: a list of messages goes in, a single response comes out. Same shape every model supports. If you haven’t set up your SDK yet, start with Drop-in SDKs.

A minimal call

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opper.ai/v3/compat",
    api_key=os.environ["OPPER_API_KEY"],
)

r = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{"role": "user", "content": "What's a vector?"}],
)
print(r.choices[0].message.content)

Pick a model

The model field is always provider-prefixed: openai/gpt-5.5, anthropic/claude-sonnet-4-6, gemini/gemini-2.5-pro. Browse the full set on the Models page. You can call any model from any SDK. The Anthropic SDK can talk to a Google model, the OpenAI SDK can talk to a Claude model. The provider prefix decides where the call routes. The SDK only sets the request shape. If you don’t pass model, the call falls through to your Route rule (if any), then to model preference hints.

Common parameters

ParameterWhat it does
temperature0 to 2. Lower is more deterministic.
top_p0 to 1. Nucleus sampling. Don’t use with temperature at the same time.
max_tokensCap the response length.
stopA string or array of strings. The model stops as soon as it sees one.
frequency_penalty / presence_penalty-2 to 2. Penalize repeated tokens.
nHow many response choices to generate. Default 1.
Reasoning models (the GPT-5 family, Claude with extended thinking) also accept reasoning_effort: "low" | "medium" | "high" to control how much the model “thinks” before answering.

The response shape

Response
{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "model": "openai/gpt-5-mini",
  "created": 1716124800,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A vector is..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 42,
    "total_tokens": 53
  }
}
The two things you’ll read most:
  • choices[0].message.content is the assistant’s reply
  • choices[0].finish_reason is why the model stopped (stop, length, tool_calls, content_filter)

What’s next

Conversations

Multi-turn chat with message history.

Tool calling

Let the model call your code.

Streaming

Stream tokens as they’re generated.

Structured output

Get JSON back instead of free text.