The Opper Realtime API is a bidirectional WebSocket protocol — it can’t be fully represented in OpenAPI, so this page documents the event vocabulary. For the end-to-end journey (authentication, session lifecycle, tool flow, per-provider notes, billing), see the Realtime voice guide.Documentation Index
Fetch the complete documentation index at: https://docs.opper.ai/llms.txt
Use this file to discover all available pages before exploring further.
Connection
- Server-side (bearer token):
Authorization: Bearer <project-scoped runtime API key>. - Browser (ephemeral ticket):
Sec-WebSocket-Protocol: opper-ticket.<value>subprotocol header (recommended), or?ticket=<value>query parameter (fallback — bearer credentials in URLs end up in access logs). Tickets are minted byPOST /v3/realtime-sessionsand are single-use.
101 Switching Protocols. The first frame the client sends must be session.start.
Client → server events
| Event | Required fields | Purpose |
|---|---|---|
session.start | config.model | Open the session. Must be the first frame. |
session.update | config | Update session config mid-stream. Capability validation re-runs after any ticket overlay — unsupported modalities / voice / reasoning_effort values are rejected before reaching the upstream. OpenAI rejects mid-session changes to input_transcription and input_transcription_model (one-shot at start). Gemini ignores session.update entirely. |
audio.append | audio (base64 PCM16) | Stream a chunk of microphone audio. |
audio.commit | — | Mark end of speech turn (when not using server VAD). |
audio.clear | — | Discard buffered uncommitted audio. |
text.input | text | Send a typed user message. |
response.create | — | Force a model response now. |
response.cancel | — | Cancel an in-flight response. |
tool.result | tool_call_id, tool_result | Return a function call result the model requested. |
session.start config
| Field | Type | Notes |
|---|---|---|
model | string | Required. Provider-prefixed id, e.g. openai/gpt-realtime-2, xai/grok-voice-latest, gemini/gemini-3.1-flash-live-preview. |
voice | string | Provider-specific voice id. |
instructions | string | System prompt. |
modalities | string[] | Output modalities. OpenAI realtime accepts a single value — ["audio"] or ["text"], not both. xAI and Gemini accept either form. Values outside the resolved model’s capabilities are rejected before the upstream is dialed. |
turn_detection | object | { type, threshold, prefix_padding_ms, silence_duration_ms }. |
tools | object[] | Function-calling schema. |
reasoning_effort | string | gpt-realtime-2 only. minimal / low / medium / high / xhigh. |
input_transcription | bool | Surface transcript.committed events. Off by default. |
input_transcription_model | string | OpenAI only — selects the transcription model. |
output_transcription | bool | Surface assistant-speech text.delta events. Off by default. |
Server → client events
| Event | Fields | Purpose |
|---|---|---|
session.started | session_id, input_sample_rate, output_sample_rate, audio_format | Upstream session is live. |
audio.delta | audio (base64 PCM16) | Assistant audio chunk at output_sample_rate. |
text.delta | delta | Streaming assistant text (when transcription opt-in is on, or model emits text). |
transcript.committed | transcript | User-speech transcript (when input_transcription is on). |
speech.started / speech.stopped | — | VAD events for the user’s microphone. |
response.started / response.completed | — | Lifecycle markers for an assistant turn. |
tool.call | tool_call_id, tool_name, tool_arguments | Function call request from the model. |
session.terminating | error.code, error.message | Server is closing the session. See termination codes below. |
session.ended | — | Final frame before the upstream WS closes. |
error | error.code, error.message | Provider or protocol error. Mid-session errors: error.code is a stable Opper code for protocol-level rejections (unsupported_modalities, invalid_config, provider_error, etc.) or a pass-through provider code for upstream-originated errors. Treat unknown codes as recoverable and surface error.message to the user. Do not enumerate — new codes can land as new providers are added. |
Termination codes
session.terminating.error.code is a closed set; you can enumerate against it:
| Code | Cause |
|---|---|
session_timeout | Session exceeded the 30-minute max duration. |
idle_timeout | No client traffic for 60 seconds. |
balance_exhausted | Organization balance is empty. |
project_spend_cap_hit | Project hit its configured spend cap. |
org_spend_cap_hit | Organization hit its configured spend cap. |
billing_not_supported | Account on a plan that doesn’t support realtime billing. |
Sample payloads
Every event is a JSON object over a text WebSocket frame. Field shapes below are the source of truth for client implementations and agent code generation — match these exactly.Client → server
session.start
session.update
audio.append
text.input
tool.result
Server → client
session.started
audio.delta
text.delta
transcript.committed
speech.started
response.started
tool.call
session.terminating
error
Preflight rejections
Before the WebSocket upgrade, the endpoint returns standard HTTP status codes:| Status | Cause |
|---|---|
401 Unauthorized | Missing, non-runtime, or project-less API key. |
402 Payment Required | Balance exhausted, spend cap hit, or plan doesn’t support realtime. |
429 Too Many Requests | Concurrent-session cap reached for this project (default 5). |
503 Service Unavailable | Realtime endpoint not configured for the requested provider. |
See also
- Realtime voice guide — full journey including authentication, session lifecycle, tool flow, billing, per-provider notes, and a working TypeScript example.
- brainstorm-time cookbook — complete end-to-end voice app.