Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.opper.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Opper Realtime API is a bidirectional WebSocket protocol — it can’t be fully represented in OpenAPI, so this page documents the event vocabulary. For the end-to-end journey (authentication, session lifecycle, tool flow, per-provider notes, billing), see the Realtime voice guide.

Connection

wss://api.opper.ai/v3/realtime
Two authentication paths are accepted on the upgrade request:
  • Server-side (bearer token): Authorization: Bearer <project-scoped runtime API key>.
  • Browser (ephemeral ticket): Sec-WebSocket-Protocol: opper-ticket.<value> subprotocol header (recommended), or ?ticket=<value> query parameter (fallback — bearer credentials in URLs end up in access logs). Tickets are minted by POST /v3/realtime-sessions and are single-use.
Successful upgrade returns 101 Switching Protocols. The first frame the client sends must be session.start.

Client → server events

EventRequired fieldsPurpose
session.startconfig.modelOpen the session. Must be the first frame.
session.updateconfigUpdate session config mid-stream. Capability validation re-runs after any ticket overlay — unsupported modalities / voice / reasoning_effort values are rejected before reaching the upstream. OpenAI rejects mid-session changes to input_transcription and input_transcription_model (one-shot at start). Gemini ignores session.update entirely.
audio.appendaudio (base64 PCM16)Stream a chunk of microphone audio.
audio.commitMark end of speech turn (when not using server VAD).
audio.clearDiscard buffered uncommitted audio.
text.inputtextSend a typed user message.
response.createForce a model response now.
response.cancelCancel an in-flight response.
tool.resulttool_call_id, tool_resultReturn a function call result the model requested.

session.start config

FieldTypeNotes
modelstringRequired. Provider-prefixed id, e.g. openai/gpt-realtime-2, xai/grok-voice-latest, gemini/gemini-3.1-flash-live-preview.
voicestringProvider-specific voice id.
instructionsstringSystem prompt.
modalitiesstring[]Output modalities. OpenAI realtime accepts a single value — ["audio"] or ["text"], not both. xAI and Gemini accept either form. Values outside the resolved model’s capabilities are rejected before the upstream is dialed.
turn_detectionobject{ type, threshold, prefix_padding_ms, silence_duration_ms }.
toolsobject[]Function-calling schema.
reasoning_effortstringgpt-realtime-2 only. minimal / low / medium / high / xhigh.
input_transcriptionboolSurface transcript.committed events. Off by default.
input_transcription_modelstringOpenAI only — selects the transcription model.
output_transcriptionboolSurface assistant-speech text.delta events. Off by default.

Server → client events

EventFieldsPurpose
session.startedsession_id, input_sample_rate, output_sample_rate, audio_formatUpstream session is live.
audio.deltaaudio (base64 PCM16)Assistant audio chunk at output_sample_rate.
text.deltadeltaStreaming assistant text (when transcription opt-in is on, or model emits text).
transcript.committedtranscriptUser-speech transcript (when input_transcription is on).
speech.started / speech.stoppedVAD events for the user’s microphone.
response.started / response.completedLifecycle markers for an assistant turn.
tool.calltool_call_id, tool_name, tool_argumentsFunction call request from the model.
session.terminatingerror.code, error.messageServer is closing the session. See termination codes below.
session.endedFinal frame before the upstream WS closes.
errorerror.code, error.messageProvider or protocol error. Mid-session errors: error.code is a stable Opper code for protocol-level rejections (unsupported_modalities, invalid_config, provider_error, etc.) or a pass-through provider code for upstream-originated errors. Treat unknown codes as recoverable and surface error.message to the user. Do not enumerate — new codes can land as new providers are added.

Termination codes

session.terminating.error.code is a closed set; you can enumerate against it:
CodeCause
session_timeoutSession exceeded the 30-minute max duration.
idle_timeoutNo client traffic for 60 seconds.
balance_exhaustedOrganization balance is empty.
project_spend_cap_hitProject hit its configured spend cap.
org_spend_cap_hitOrganization hit its configured spend cap.
billing_not_supportedAccount on a plan that doesn’t support realtime billing.

Sample payloads

Every event is a JSON object over a text WebSocket frame. Field shapes below are the source of truth for client implementations and agent code generation — match these exactly.

Client → server

session.start
{
  "type": "session.start",
  "config": {
    "model": "openai/gpt-realtime-2",
    "voice": "marin",
    "instructions": "You are a concise voice assistant.",
    "modalities": ["audio"],
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "tools": [
      {
        "name": "lookup_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    ],
    "reasoning_effort": "low",
    "input_transcription": true,
    "input_transcription_model": "gpt-4o-mini-transcribe",
    "output_transcription": false
  }
}
session.update
{
  "type": "session.update",
  "config": { "voice": "verse", "instructions": "Be terser." }
}
audio.append
{ "type": "audio.append", "audio": "<base64-encoded PCM16 chunk>" }
text.input
{ "type": "text.input", "text": "What's the weather in Stockholm?" }
tool.result
{
  "type": "tool.result",
  "tool_call_id": "call_abc123",
  "tool_result": { "temperature_c": 12, "conditions": "rain" }
}

Server → client

session.started
{
  "type": "session.started",
  "session_id": "sess_xyz",
  "input_sample_rate": 24000,
  "output_sample_rate": 24000,
  "audio_format": "pcm16"
}
audio.delta
{ "type": "audio.delta", "audio": "<base64-encoded PCM16 chunk>" }
text.delta
{ "type": "text.delta", "delta": "It's currently raining" }
transcript.committed
{
  "type": "transcript.committed",
  "transcript": "what's the weather in stockholm"
}
speech.started
{ "type": "speech.started" }
response.started
{ "type": "response.started" }
tool.call
{
  "type": "tool.call",
  "tool_call_id": "call_abc123",
  "tool_name": "lookup_weather",
  "tool_arguments": { "city": "Stockholm" }
}
session.terminating
{
  "type": "session.terminating",
  "error": {
    "code": "idle_timeout",
    "message": "No client traffic for 60 seconds."
  }
}
error
{
  "type": "error",
  "error": {
    "code": "unsupported_modalities",
    "message": "config.modalities contains \"text\" but model \"fake/audio-only\" does not support text output"
  }
}

Preflight rejections

Before the WebSocket upgrade, the endpoint returns standard HTTP status codes:
StatusCause
401 UnauthorizedMissing, non-runtime, or project-less API key.
402 Payment RequiredBalance exhausted, spend cap hit, or plan doesn’t support realtime.
429 Too Many RequestsConcurrent-session cap reached for this project (default 5).
503 Service UnavailableRealtime endpoint not configured for the requested provider.

See also