Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.opper.ai/llms.txt

Use this file to discover all available pages before exploring further.

Streaming shows the response as it’s generated. Set stream: true and you get an iterator of chunks instead of waiting for the whole response. This works the same way it does in the OpenAI SDK. If you’ve streamed an OpenAI completion before, you’ve already written this code.

A working example

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.opper.ai/v3/compat",
    api_key=os.environ["OPPER_API_KEY"],
)

stream = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{"role": "user", "content": "Write a haiku about Tuesday."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

What’s in each chunk

Each streamed chunk looks like a normal chat completion, except:
  • choices[0].delta carries the new content (instead of choices[0].message).
  • delta.content is the next piece of text. May be empty on some chunks.
  • delta.tool_calls carries fragments of a tool call as the model assembles it (see below).
  • finish_reason is null while streaming. The final chunk sets it to stop, length, tool_calls, or content_filter.

Streaming tool calls

When the model decides to call a tool, the arguments arrive as JSON fragments across multiple chunks. Concatenate them as they come in, then parse once the call is complete.
Python
import json
from collections import defaultdict

stream = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=messages,
    tools=tools,
    stream=True,
)

# Accumulate tool args by index
args_by_index = defaultdict(str)
tool_call_by_index = {}

for chunk in stream:
    for tc in chunk.choices[0].delta.tool_calls or []:
        tool_call_by_index[tc.index] = tc
        if tc.function and tc.function.arguments:
            args_by_index[tc.index] += tc.function.arguments

# After the loop, each entry in args_by_index is a complete JSON string
for index, tc in tool_call_by_index.items():
    if tc.function and tc.function.name:
        parsed = json.loads(args_by_index[index])
        print(f"{tc.function.name}({parsed})")
This lets you show a “calling search…” UI as the call assembles, rather than waiting for the model to finish.

Errors mid-stream

If something goes wrong after streaming has started (a Guard rule rejects the output, the upstream model fails, a timeout fires), the stream ends with an error chunk. Always handle exceptions around the iteration.

What’s next

Tool calling

The full tool-use round trip.

Conversations

Multi-turn chat with message history.

JSON API streaming

Field-level streaming instead of token-level.