> ## Documentation Index
> Fetch the complete documentation index at: https://docs.opper.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Realtime protocol

> WebSocket event vocabulary for the Opper Realtime API. See the full guide for end-to-end usage.

The Opper Realtime API is a bidirectional WebSocket protocol — it can't be fully represented in OpenAPI, so this page documents the event vocabulary. For the end-to-end journey (authentication, session lifecycle, tool flow, per-provider notes, billing), see the **[Realtime voice guide](/capabilities/realtime)**.

## Connection

```
wss://api.opper.ai/v3/realtime
```

Two authentication paths are accepted on the upgrade request:

* **Server-side (bearer token):** `Authorization: Bearer <project-scoped runtime API key>`.
* **Browser (ephemeral ticket):** `Sec-WebSocket-Protocol: opper-ticket.<value>` subprotocol header (recommended), or `?ticket=<value>` query parameter (fallback — bearer credentials in URLs end up in access logs). Tickets are minted by [`POST /v3/realtime-sessions`](/v3-api-reference/realtime/create-realtime-session) and are single-use.

Successful upgrade returns `101 Switching Protocols`. The first frame the client sends must be `session.start`.

## Client → server events

| Event             | Required fields               | Purpose                                                                                                                                                                                                                                                                                                                                                             |
| ----------------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `session.start`   | `config.model`                | Open the session. Must be the first frame.                                                                                                                                                                                                                                                                                                                          |
| `session.update`  | `config`                      | Update session config mid-stream. Capability validation re-runs after any ticket overlay — unsupported `modalities` / `voice` / `reasoning_effort` values are rejected before reaching the upstream. **OpenAI rejects mid-session changes** to `input_transcription` and `input_transcription_model` (one-shot at start). Gemini ignores `session.update` entirely. |
| `audio.append`    | `audio` (base64 PCM16)        | Stream a chunk of microphone audio.                                                                                                                                                                                                                                                                                                                                 |
| `audio.commit`    | —                             | Mark end of speech turn (when not using server VAD).                                                                                                                                                                                                                                                                                                                |
| `audio.clear`     | —                             | Discard buffered uncommitted audio.                                                                                                                                                                                                                                                                                                                                 |
| `text.input`      | `text`                        | Send a typed user message.                                                                                                                                                                                                                                                                                                                                          |
| `response.create` | —                             | Force a model response now.                                                                                                                                                                                                                                                                                                                                         |
| `response.cancel` | —                             | Cancel an in-flight response.                                                                                                                                                                                                                                                                                                                                       |
| `tool.result`     | `tool_call_id`, `tool_result` | Return a function call result the model requested.                                                                                                                                                                                                                                                                                                                  |

### `session.start` config

| Field                       | Type      | Notes                                                                                                                                                                                                                            |
| --------------------------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`                     | string    | Required. Provider-prefixed id, e.g. `openai/gpt-realtime-2`, `xai/grok-voice-latest`, `gemini/gemini-3.1-flash-live-preview`.                                                                                                   |
| `voice`                     | string    | Provider-specific voice id.                                                                                                                                                                                                      |
| `instructions`              | string    | System prompt.                                                                                                                                                                                                                   |
| `modalities`                | string\[] | Output modalities. OpenAI realtime accepts a single value — `["audio"]` or `["text"]`, not both. xAI and Gemini accept either form. Values outside the resolved model's capabilities are rejected before the upstream is dialed. |
| `turn_detection`            | object    | `{ type, threshold, prefix_padding_ms, silence_duration_ms }`.                                                                                                                                                                   |
| `tools`                     | object\[] | Function-calling schema.                                                                                                                                                                                                         |
| `reasoning_effort`          | string    | `gpt-realtime-2` only. `minimal` / `low` / `medium` / `high` / `xhigh`.                                                                                                                                                          |
| `input_transcription`       | bool      | Surface `transcript.committed` events. Off by default.                                                                                                                                                                           |
| `input_transcription_model` | string    | OpenAI only — selects the transcription model.                                                                                                                                                                                   |
| `output_transcription`      | bool      | Surface assistant-speech `text.delta` events. Off by default.                                                                                                                                                                    |

## Server → client events

| Event                                     | Fields                                                                  | Purpose                                                                                                                                                                                                                                                                                                                                                                                                      |
| ----------------------------------------- | ----------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `session.started`                         | `session_id`, `input_sample_rate`, `output_sample_rate`, `audio_format` | Upstream session is live.                                                                                                                                                                                                                                                                                                                                                                                    |
| `audio.delta`                             | `audio` (base64 PCM16)                                                  | Assistant audio chunk at `output_sample_rate`.                                                                                                                                                                                                                                                                                                                                                               |
| `text.delta`                              | `delta`                                                                 | Streaming assistant text (when transcription opt-in is on, or model emits text).                                                                                                                                                                                                                                                                                                                             |
| `transcript.committed`                    | `transcript`                                                            | User-speech transcript (when `input_transcription` is on).                                                                                                                                                                                                                                                                                                                                                   |
| `speech.started` / `speech.stopped`       | —                                                                       | VAD events for the user's microphone.                                                                                                                                                                                                                                                                                                                                                                        |
| `response.started` / `response.completed` | —                                                                       | Lifecycle markers for an assistant turn.                                                                                                                                                                                                                                                                                                                                                                     |
| `tool.call`                               | `tool_call_id`, `tool_name`, `tool_arguments`                           | Function call request from the model.                                                                                                                                                                                                                                                                                                                                                                        |
| `session.terminating`                     | `error.code`, `error.message`                                           | Server is closing the session. See termination codes below.                                                                                                                                                                                                                                                                                                                                                  |
| `session.ended`                           | —                                                                       | Final frame before the upstream WS closes.                                                                                                                                                                                                                                                                                                                                                                   |
| `error`                                   | `error.code`, `error.message`                                           | Provider or protocol error. **Mid-session errors:** `error.code` is a stable Opper code for protocol-level rejections (`unsupported_modalities`, `invalid_config`, `provider_error`, etc.) or a pass-through provider code for upstream-originated errors. Treat unknown codes as recoverable and surface `error.message` to the user. **Do not** enumerate — new codes can land as new providers are added. |

### Termination codes

`session.terminating.error.code` is a closed set; you can enumerate against it:

| Code                    | Cause                                                    |
| ----------------------- | -------------------------------------------------------- |
| `session_timeout`       | Session exceeded the 30-minute max duration.             |
| `idle_timeout`          | No client traffic for 60 seconds.                        |
| `balance_exhausted`     | Organization balance is empty.                           |
| `project_spend_cap_hit` | Project hit its configured spend cap.                    |
| `org_spend_cap_hit`     | Organization hit its configured spend cap.               |
| `billing_not_supported` | Account on a plan that doesn't support realtime billing. |

## Sample payloads

Every event is a JSON object over a text WebSocket frame. Field shapes below are
the source of truth for client implementations and agent code generation —
match these exactly.

### Client → server

```json session.start theme={null}
{
  "type": "session.start",
  "config": {
    "model": "openai/gpt-realtime-2",
    "voice": "marin",
    "instructions": "You are a concise voice assistant.",
    "modalities": ["audio"],
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "tools": [
      {
        "name": "lookup_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    ],
    "reasoning_effort": "low",
    "input_transcription": true,
    "input_transcription_model": "gpt-4o-mini-transcribe",
    "output_transcription": false
  }
}
```

```json session.update theme={null}
{
  "type": "session.update",
  "config": { "voice": "verse", "instructions": "Be terser." }
}
```

```json audio.append theme={null}
{ "type": "audio.append", "audio": "<base64-encoded PCM16 chunk>" }
```

```json text.input theme={null}
{ "type": "text.input", "text": "What's the weather in Stockholm?" }
```

```json tool.result theme={null}
{
  "type": "tool.result",
  "tool_call_id": "call_abc123",
  "tool_result": { "temperature_c": 12, "conditions": "rain" }
}
```

### Server → client

```json session.started theme={null}
{
  "type": "session.started",
  "session_id": "sess_xyz",
  "input_sample_rate": 24000,
  "output_sample_rate": 24000,
  "audio_format": "pcm16"
}
```

```json audio.delta theme={null}
{ "type": "audio.delta", "audio": "<base64-encoded PCM16 chunk>" }
```

```json text.delta theme={null}
{ "type": "text.delta", "delta": "It's currently raining" }
```

```json transcript.committed theme={null}
{
  "type": "transcript.committed",
  "transcript": "what's the weather in stockholm"
}
```

```json speech.started theme={null}
{ "type": "speech.started" }
```

```json response.started theme={null}
{ "type": "response.started" }
```

```json tool.call theme={null}
{
  "type": "tool.call",
  "tool_call_id": "call_abc123",
  "tool_name": "lookup_weather",
  "tool_arguments": { "city": "Stockholm" }
}
```

```json session.terminating theme={null}
{
  "type": "session.terminating",
  "error": {
    "code": "idle_timeout",
    "message": "No client traffic for 60 seconds."
  }
}
```

```json error theme={null}
{
  "type": "error",
  "error": {
    "code": "unsupported_modalities",
    "message": "config.modalities contains \"text\" but model \"fake/audio-only\" does not support text output"
  }
}
```

## Preflight rejections

Before the WebSocket upgrade, the endpoint returns standard HTTP status codes:

| Status                    | Cause                                                               |
| ------------------------- | ------------------------------------------------------------------- |
| `401 Unauthorized`        | Missing, non-runtime, or project-less API key.                      |
| `402 Payment Required`    | Balance exhausted, spend cap hit, or plan doesn't support realtime. |
| `429 Too Many Requests`   | Concurrent-session cap reached for this project (default 5).        |
| `503 Service Unavailable` | Realtime endpoint not configured for the requested provider.        |

## See also

* **[Realtime voice guide](/capabilities/realtime)** — full journey including authentication, session lifecycle, tool flow, billing, per-provider notes, and a working TypeScript example.
* **[brainstorm-time cookbook](https://github.com/opper-ai/opper-cookbook/tree/main/examples/brainstorm-time)** — complete end-to-end voice app.
