Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.opper.ai/llms.txt

Use this file to discover all available pages before exploring further.

A conversation is just a list of messages. You keep adding to it, send the whole list every time, and the model has full context of everything that came before.

The message shape

Every message has a role and content.
RoleWhat it is
systemInstructions for the model. Usually one message at the start of the conversation.
userWhat the user typed.
assistantWhat the model said. You append these from previous responses.
toolThe result of a tool call (when using Tool calling).

A working multi-turn

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opper.ai/v3/compat",
    api_key=os.environ["OPPER_API_KEY"],
)

messages = [
    {"role": "system", "content": "You are a concise assistant."},
    {"role": "user", "content": "What's a vector?"},
]

# Turn 1
r = client.chat.completions.create(model="openai/gpt-5-mini", messages=messages)
reply = r.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print("Assistant:", reply)

# Turn 2 — the model remembers turn 1 because we sent the whole list
messages.append({"role": "user", "content": "Give me an example in 3D."})
r = client.chat.completions.create(model="openai/gpt-5-mini", messages=messages)
print("Assistant:", r.choices[0].message.content)

System prompt

The first system message sets the model’s persona and ground rules. It stays in the conversation, and you only write it once.
Python
{"role": "system", "content": "You are a customer support agent for Acme Corp. Always cite a ticket number when referring to a past issue."}
Keep it short. Long system prompts cost more on every turn and tend to be ignored.

Stop reasons

The response tells you why the model stopped:
finish_reasonMeaning
stopModel finished its answer naturally.
lengthHit max_tokens. Response may be truncated.
tool_callsModel wants to call a tool (see Tool calling).
content_filterA Guard rule blocked or modified the output.

Keeping history under control

Messages add up fast. A few options:
  • Trim older turns when the conversation gets long. Keep the system message and recent N turns.
  • Summarize older turns into a single assistant message (“earlier you discussed: X, Y, Z”).
  • Use a separate JSON API call to extract just the bits worth remembering, then drop the rest. See JSON API.

What’s next

Tool calling

Let the model invoke your tools mid-conversation.

Streaming

Stream the response token-by-token.

Vision & PDFs

Send images and documents as message content.

Drop-in SDKs

The auth and base URL setup.