Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.opper.ai/llms.txt

Use this file to discover all available pages before exploring further.

This is the shortest path to a working voice loop. Pick the tab that matches where your code runs, copy the snippet, and swap in your API key. For the full protocol (events, config fields, per-provider notes, billing, transcription, tool flow), see the Realtime protocol.

Run your first session

1

Get an API key

Create a project-scoped runtime API key in the Opper dashboard.
2

Open a session

Server-side clients connect directly with a bearer token. This is the quickest way to check that the endpoint works.
npm install ws
import WebSocket from "ws";

const ws = new WebSocket("wss://api.opper.ai/v3/realtime", {
  headers: { Authorization: `Bearer ${process.env.OPPER_API_KEY}` },
});

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.start",
    config: {
      model: "openai/gpt-realtime-2",
      voice: "marin",
      instructions: "You are a concise voice assistant.",
    },
  }));
});

ws.on("message", (raw) => {
  const ev = JSON.parse(raw.toString());
  if (ev.type === "session.started") {
    console.log(`live: ${ev.session_id} @ ${ev.output_sample_rate}Hz`);
  }
  if (ev.type === "audio.delta") {
    // ev.audio is base64-encoded PCM16. Pipe to your audio player.
  }
});
3

Watch the events

On a successful open:
{ "type": "session.started", "session_id": "sess_...", "input_sample_rate": 24000, "output_sample_rate": 24000, "audio_format": "pcm16" }
From here, stream audio in with audio.append (base64-encoded PCM16 at input_sample_rate), and the assistant’s audio comes back in audio.delta frames at output_sample_rate. If a tool fires you get a tool.call and reply with tool.result. When the session ends you get session.terminating followed by a clean WebSocket close.

Switch providers

The protocol is the same across providers. Change one string:
ProviderModel idNotes
OpenAIopenai/gpt-realtime-2Reasoning effort supported. 24 kHz symmetric.
xAIxai/grok-voice-latestPer-minute billing. 24 kHz symmetric.
Geminigemini/gemini-3.1-flash-live-previewAsymmetric sample rates: 16 kHz in, 24 kHz out.
See Per-provider notes for voice lists and quirks.

What’s next

Realtime protocol

Every config field, event, and per-provider note.

Models

The full list of supported realtime model IDs.

Mint endpoint

POST /v3/realtime-sessions request and response schema.

Cookbook example

A complete browser voice app with microphone capture and tool calls.