Documentation Index
Fetch the complete documentation index at: https://docs.opper.ai/llms.txt
Use this file to discover all available pages before exploring further.
This is the shortest path to a working voice loop. Pick the tab that matches where your code runs, copy the snippet, replace the API key, and you have a live session.
For the full protocol (every event, every config field, per-provider notes, billing, transcription, tool flow), see the Realtime voice guide.
1. Get an API key
Create a project-scoped runtime API key in the Opper dashboard. Management keys (opmak-…) are rejected — only runtime keys can open realtime sessions.
2. Open a session
Server-side clients connect directly with a bearer token. This is the fastest way to verify the endpoint works end to end.import WebSocket from "ws";
const ws = new WebSocket("wss://api.opper.ai/v3/realtime", {
headers: { Authorization: `Bearer ${process.env.OPPER_API_KEY}` },
});
ws.on("open", () => {
ws.send(JSON.stringify({
type: "session.start",
config: {
model: "openai/gpt-realtime-2",
voice: "marin",
instructions: "You are a concise voice assistant.",
},
}));
});
ws.on("message", (raw) => {
const ev = JSON.parse(raw.toString());
if (ev.type === "session.started") {
console.log(`live: ${ev.session_id} @ ${ev.output_sample_rate}Hz`);
}
if (ev.type === "audio.delta") {
// ev.audio is base64-encoded PCM16 — pipe to your audio player.
}
});
Browsers can’t set an Authorization header on the native WebSocket constructor. Mint a single-use ticket from your backend, then redeem it from the browser.Step A — your backend mints the ticket:// POST from your trusted server (Node, Python, anything)
const resp = await fetch("https://api.opper.ai/v3/realtime-sessions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPPER_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
config: {
model: "openai/gpt-realtime-2",
voice: "marin",
instructions: "You are a concise voice assistant.",
},
}),
});
const { client_secret } = await resp.json();
// Return client_secret to the browser.
Step B — the browser redeems it via subprotocol header:const ws = new WebSocket(
"wss://api.opper.ai/v3/realtime",
[`opper-ticket.${clientSecret}`],
);
ws.onopen = () => {
// Bound fields (model, voice, instructions) are already locked in
// by the ticket. Send any remaining session.start fields here.
ws.send(JSON.stringify({ type: "session.start", config: {} }));
};
ws.onmessage = (e) => {
const ev = JSON.parse(e.data);
// Handle session.started, audio.delta, tool.call, etc.
};
Tickets are single-use and expire in 60 seconds by default. Whatever fields you populate in the mint request are locked — the browser can’t override them. See Pre-binding for security.
3. What you’ll see
On a successful open:
{ "type": "session.started", "session_id": "sess_...", "input_sample_rate": 24000, "output_sample_rate": 24000, "audio_format": "pcm16" }
From here, stream audio in with audio.append (base64-encoded PCM16 at input_sample_rate), and the assistant’s audio comes back in audio.delta frames at output_sample_rate. If a tool fires you get a tool.call and reply with tool.result. When the session ends you get session.terminating followed by a clean WebSocket close.
Next steps
- Realtime voice guide — every config field, every event, per-provider notes (OpenAI / xAI / Gemini), tool calls, transcription, billing, session limits, security model.
- Realtime protocol reference — the WebSocket event vocabulary at a glance, with sample JSON payloads.
- Mint endpoint reference —
POST /v3/realtime-sessions request/response schema.
- brainstorm-time cookbook — a complete browser voice app with microphone capture, audio playback, tools, and provider switching.
Switch providers
The protocol is unified — change one string:
| Provider | Model id | Notes |
|---|
| OpenAI | openai/gpt-realtime-2 | Reasoning effort supported. 24 kHz symmetric. |
| xAI | xai/grok-voice-latest | Per-minute billing. 24 kHz symmetric. |
| Gemini | gemini/gemini-3.1-flash-live-preview | Asymmetric sample rates — 16 kHz in / 24 kHz out. |
See Per-provider notes for voice lists and quirks.