Realtime protocol

The Opper Realtime API is a bidirectional WebSocket protocol — it can’t be fully represented in OpenAPI, so this page documents the event vocabulary. For the end-to-end journey (authentication, session lifecycle, tool flow, per-provider notes, billing), see the Realtime voice guide.

Connection

wss://api.opper.ai/v3/realtime

Two authentication paths are accepted on the upgrade request:

Server-side (bearer token): Authorization: Bearer <project-scoped runtime API key>.
Browser (ephemeral ticket): Sec-WebSocket-Protocol: opper-ticket.<value> subprotocol header (recommended), or ?ticket=<value> query parameter (fallback — bearer credentials in URLs end up in access logs). Tickets are minted by POST /v3/realtime-sessions and are single-use.

Successful upgrade returns 101 Switching Protocols. The first frame the client sends must be session.start.

Client → server events

Event	Required fields	Purpose
`session.start`	`config.model`	Open the session. Must be the first frame.
`session.update`	`config`	Update session config mid-stream. Capability validation re-runs after any ticket overlay — unsupported `modalities` / `voice` / `reasoning_effort` values are rejected before reaching the upstream. OpenAI rejects mid-session changes to `input_transcription` and `input_transcription_model` (one-shot at start). Gemini ignores `session.update` entirely.
`audio.append`	`audio` (base64 PCM16)	Stream a chunk of microphone audio.
`audio.commit`	—	Mark end of speech turn (when not using server VAD).
`audio.clear`	—	Discard buffered uncommitted audio.
`text.input`	`text`	Send a typed user message.
`response.create`	—	Force a model response now.
`response.cancel`	—	Cancel an in-flight response.
`tool.result`	`tool_call_id`, `tool_result`	Return a function call result the model requested.

`session.start` config

Field	Type	Notes
`model`	string	Required. Provider-prefixed id, e.g. `openai/gpt-realtime-2`, `xai/grok-voice-latest`, `gemini/gemini-3.1-flash-live-preview`.
`voice`	string	Provider-specific voice id.
`instructions`	string	System prompt.
`modalities`	string[]	Output modalities. OpenAI realtime accepts a single value — `["audio"]` or `["text"]`, not both. xAI and Gemini accept either form. Values outside the resolved model’s capabilities are rejected before the upstream is dialed.
`turn_detection`	object	`{ type, threshold, prefix_padding_ms, silence_duration_ms }`.
`tools`	object[]	Function-calling schema.
`reasoning_effort`	string	`gpt-realtime-2` only. `minimal` / `low` / `medium` / `high` / `xhigh`.
`input_transcription`	bool	Surface `transcript.committed` events. Off by default.
`input_transcription_model`	string	OpenAI only — selects the transcription model.
`output_transcription`	bool	Surface assistant-speech `text.delta` events. Off by default.

Server → client events

Event	Fields	Purpose
`session.started`	`session_id`, `input_sample_rate`, `output_sample_rate`, `audio_format`	Upstream session is live.
`audio.delta`	`audio` (base64 PCM16)	Assistant audio chunk at `output_sample_rate`.
`text.delta`	`delta`	Streaming assistant text (when transcription opt-in is on, or model emits text).
`transcript.committed`	`transcript`	User-speech transcript (when `input_transcription` is on).
`speech.started` / `speech.stopped`	—	VAD events for the user’s microphone.
`response.started` / `response.completed`	—	Lifecycle markers for an assistant turn.
`tool.call`	`tool_call_id`, `tool_name`, `tool_arguments`	Function call request from the model.
`session.terminating`	`error.code`, `error.message`	Server is closing the session. See termination codes below.
`session.ended`	—	Final frame before the upstream WS closes.
`error`	`error.code`, `error.message`	Provider or protocol error. Mid-session errors: `error.code` is a stable Opper code for protocol-level rejections (`unsupported_modalities`, `invalid_config`, `provider_error`, etc.) or a pass-through provider code for upstream-originated errors. Treat unknown codes as recoverable and surface `error.message` to the user. Do not enumerate — new codes can land as new providers are added.

Termination codes

session.terminating.error.code is a closed set; you can enumerate against it:

Code	Cause
`session_timeout`	Session exceeded the 30-minute max duration.
`idle_timeout`	No client traffic for 60 seconds.
`balance_exhausted`	Organization balance is empty.
`project_spend_cap_hit`	Project hit its configured spend cap.
`org_spend_cap_hit`	Organization hit its configured spend cap.
`billing_not_supported`	Account on a plan that doesn’t support realtime billing.

Sample payloads

Every event is a JSON object over a text WebSocket frame. Field shapes below are the source of truth for client implementations and agent code generation — match these exactly.

Client → server

session.start

{
  "type": "session.start",
  "config": {
    "model": "openai/gpt-realtime-2",
    "voice": "marin",
    "instructions": "You are a concise voice assistant.",
    "modalities": ["audio"],
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "tools": [
      {
        "name": "lookup_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    ],
    "reasoning_effort": "low",
    "input_transcription": true,
    "input_transcription_model": "gpt-4o-mini-transcribe",
    "output_transcription": false
  }
}

session.update

{
  "type": "session.update",
  "config": { "voice": "verse", "instructions": "Be terser." }
}

audio.append

{ "type": "audio.append", "audio": "<base64-encoded PCM16 chunk>" }

text.input

{ "type": "text.input", "text": "What's the weather in Stockholm?" }

tool.result

{
  "type": "tool.result",
  "tool_call_id": "call_abc123",
  "tool_result": { "temperature_c": 12, "conditions": "rain" }
}

Server → client

session.started

{
  "type": "session.started",
  "session_id": "sess_xyz",
  "input_sample_rate": 24000,
  "output_sample_rate": 24000,
  "audio_format": "pcm16"
}

audio.delta

{ "type": "audio.delta", "audio": "<base64-encoded PCM16 chunk>" }

text.delta

{ "type": "text.delta", "delta": "It's currently raining" }

transcript.committed

{
  "type": "transcript.committed",
  "transcript": "what's the weather in stockholm"
}

speech.started

{ "type": "speech.started" }

response.started

{ "type": "response.started" }

tool.call

{
  "type": "tool.call",
  "tool_call_id": "call_abc123",
  "tool_name": "lookup_weather",
  "tool_arguments": { "city": "Stockholm" }
}

session.terminating

{
  "type": "session.terminating",
  "error": {
    "code": "idle_timeout",
    "message": "No client traffic for 60 seconds."
  }
}

error

{
  "type": "error",
  "error": {
    "code": "unsupported_modalities",
    "message": "config.modalities contains \"text\" but model \"fake/audio-only\" does not support text output"
  }
}

Preflight rejections

Before the WebSocket upgrade, the endpoint returns standard HTTP status codes:

Status	Cause
`401 Unauthorized`	Missing, non-runtime, or project-less API key.
`402 Payment Required`	Balance exhausted, spend cap hit, or plan doesn’t support realtime.
`429 Too Many Requests`	Concurrent-session cap reached for this project (default 5).
`503 Service Unavailable`	Realtime endpoint not configured for the requested provider.

Task API

Compatibility

Realtime

Roundtable

Platform APIs

Realtime protocol

Connection

Client → server events

`session.start` config

Server → client events

Termination codes

Sample payloads

Client → server

Server → client

Preflight rejections

See also

Task API

Compatibility

Realtime

Roundtable

Platform APIs

Documentation Index

​Connection

​Client → server events

​session.start config

​Server → client events

​Termination codes

​Sample payloads

​Client → server

​Server → client

​Preflight rejections

​See also

Connection

Client → server events

`session.start` config

Server → client events

Termination codes

Sample payloads

Client → server

Server → client

Preflight rejections

See also