Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.opper.ai/llms.txt

Use this file to discover all available pages before exploring further.

This is the shortest path to a working voice loop. Pick the tab that matches where your code runs, copy the snippet, replace the API key, and you have a live session. For the full protocol (every event, every config field, per-provider notes, billing, transcription, tool flow), see the Realtime voice guide.

1. Get an API key

Create a project-scoped runtime API key in the Opper dashboard. Management keys (opmak-…) are rejected — only runtime keys can open realtime sessions.

2. Open a session

Server-side clients connect directly with a bearer token. This is the fastest way to verify the endpoint works end to end.
npm install ws
import WebSocket from "ws";

const ws = new WebSocket("wss://api.opper.ai/v3/realtime", {
  headers: { Authorization: `Bearer ${process.env.OPPER_API_KEY}` },
});

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.start",
    config: {
      model: "openai/gpt-realtime-2",
      voice: "marin",
      instructions: "You are a concise voice assistant.",
    },
  }));
});

ws.on("message", (raw) => {
  const ev = JSON.parse(raw.toString());
  if (ev.type === "session.started") {
    console.log(`live: ${ev.session_id} @ ${ev.output_sample_rate}Hz`);
  }
  if (ev.type === "audio.delta") {
    // ev.audio is base64-encoded PCM16 — pipe to your audio player.
  }
});

3. What you’ll see

On a successful open:
{ "type": "session.started", "session_id": "sess_...", "input_sample_rate": 24000, "output_sample_rate": 24000, "audio_format": "pcm16" }
From here, stream audio in with audio.append (base64-encoded PCM16 at input_sample_rate), and the assistant’s audio comes back in audio.delta frames at output_sample_rate. If a tool fires you get a tool.call and reply with tool.result. When the session ends you get session.terminating followed by a clean WebSocket close.

Next steps

  • Realtime voice guide — every config field, every event, per-provider notes (OpenAI / xAI / Gemini), tool calls, transcription, billing, session limits, security model.
  • Realtime protocol reference — the WebSocket event vocabulary at a glance, with sample JSON payloads.
  • Mint endpoint referencePOST /v3/realtime-sessions request/response schema.
  • brainstorm-time cookbook — a complete browser voice app with microphone capture, audio playback, tools, and provider switching.

Switch providers

The protocol is unified — change one string:
ProviderModel idNotes
OpenAIopenai/gpt-realtime-2Reasoning effort supported. 24 kHz symmetric.
xAIxai/grok-voice-latestPer-minute billing. 24 kHz symmetric.
Geminigemini/gemini-3.1-flash-live-previewAsymmetric sample rates — 16 kHz in / 24 kHz out.
See Per-provider notes for voice lists and quirks.