Realtime quickstart

This is the shortest path to a working voice loop. Pick the tab that matches where your code runs, copy the snippet, replace the API key, and you have a live session. For the full protocol (every event, every config field, per-provider notes, billing, transcription, tool flow), see the Realtime voice guide.

1. Get an API key

Create a project-scoped runtime API key in the Opper dashboard. Management keys (opmak-…) are rejected — only runtime keys can open realtime sessions.

2. Open a session

Server (Node.js)
Browser (ephemeral ticket)

Server-side clients connect directly with a bearer token. This is the fastest way to verify the endpoint works end to end.

npm install ws

import WebSocket from "ws";

const ws = new WebSocket("wss://api.opper.ai/v3/realtime", {
  headers: { Authorization: `Bearer ${process.env.OPPER_API_KEY}` },
});

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.start",
    config: {
      model: "openai/gpt-realtime-2",
      voice: "marin",
      instructions: "You are a concise voice assistant.",
    },
  }));
});

ws.on("message", (raw) => {
  const ev = JSON.parse(raw.toString());
  if (ev.type === "session.started") {
    console.log(`live: ${ev.session_id} @ ${ev.output_sample_rate}Hz`);
  }
  if (ev.type === "audio.delta") {
    // ev.audio is base64-encoded PCM16 — pipe to your audio player.
  }
});

Browsers can’t set an Authorization header on the native WebSocket constructor. Mint a single-use ticket from your backend, then redeem it from the browser.Step A — your backend mints the ticket:

// POST from your trusted server (Node, Python, anything)
const resp = await fetch("https://api.opper.ai/v3/realtime-sessions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPPER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    config: {
      model: "openai/gpt-realtime-2",
      voice: "marin",
      instructions: "You are a concise voice assistant.",
    },
  }),
});
const { client_secret } = await resp.json();
// Return client_secret to the browser.

Step B — the browser redeems it via subprotocol header:

const ws = new WebSocket(
  "wss://api.opper.ai/v3/realtime",
  [`opper-ticket.${clientSecret}`],
);

ws.onopen = () => {
  // Bound fields (model, voice, instructions) are already locked in
  // by the ticket. Send any remaining session.start fields here.
  ws.send(JSON.stringify({ type: "session.start", config: {} }));
};

ws.onmessage = (e) => {
  const ev = JSON.parse(e.data);
  // Handle session.started, audio.delta, tool.call, etc.
};

Tickets are single-use and expire in 60 seconds by default. Whatever fields you populate in the mint request are locked — the browser can’t override them. See Pre-binding for security.

3. What you’ll see

On a successful open:

{ "type": "session.started", "session_id": "sess_...", "input_sample_rate": 24000, "output_sample_rate": 24000, "audio_format": "pcm16" }

From here, stream audio in with audio.append (base64-encoded PCM16 at input_sample_rate), and the assistant’s audio comes back in audio.delta frames at output_sample_rate. If a tool fires you get a tool.call and reply with tool.result. When the session ends you get session.terminating followed by a clean WebSocket close.

Next steps

Realtime voice guide — every config field, every event, per-provider notes (OpenAI / xAI / Gemini), tool calls, transcription, billing, session limits, security model.
Realtime protocol reference — the WebSocket event vocabulary at a glance, with sample JSON payloads.
Mint endpoint reference — POST /v3/realtime-sessions request/response schema.
brainstorm-time cookbook — a complete browser voice app with microphone capture, audio playback, tools, and provider switching.

Switch providers

The protocol is unified — change one string:

Provider	Model id	Notes
OpenAI	`openai/gpt-realtime-2`	Reasoning effort supported. 24 kHz symmetric.
xAI	`xai/grok-voice-latest`	Per-minute billing. 24 kHz symmetric.
Gemini	`gemini/gemini-3.1-flash-live-preview`	Asymmetric sample rates — 16 kHz in / 24 kHz out.

See Per-provider notes for voice lists and quirks.

Gateway

Control Plane

Realtime

Developer Tools

Guides

Realtime quickstart

1. Get an API key

2. Open a session

3. What you’ll see

Next steps

Switch providers

Gateway

Control Plane

Realtime

Developer Tools

Guides

Documentation Index

​1. Get an API key

​2. Open a session

​3. What you’ll see

​Next steps

​Switch providers

1. Get an API key

2. Open a session

3. What you’ll see

Next steps

Switch providers