Streaming shows the response as it’s generated. SetDocumentation Index
Fetch the complete documentation index at: https://docs.opper.ai/llms.txt
Use this file to discover all available pages before exploring further.
stream: true and you get an iterator of chunks instead of waiting for the whole response.
This works the same way it does in the OpenAI SDK. If you’ve streamed an OpenAI completion before, you’ve already written this code.
A working example
What’s in each chunk
Each streamed chunk looks like a normal chat completion, except:choices[0].deltacarries the new content (instead ofchoices[0].message).delta.contentis the next piece of text. May be empty on some chunks.delta.tool_callscarries fragments of a tool call as the model assembles it (see below).finish_reasonisnullwhile streaming. The final chunk sets it tostop,length,tool_calls, orcontent_filter.
Streaming tool calls
When the model decides to call a tool, the arguments arrive as JSON fragments across multiple chunks. Concatenate them as they come in, then parse once the call is complete.Python
Errors mid-stream
If something goes wrong after streaming has started (a Guard rule rejects the output, the upstream model fails, a timeout fires), the stream ends with an error chunk. Always handle exceptions around the iteration.What’s next
Tool calling
The full tool-use round trip.
Conversations
Multi-turn chat with message history.
JSON API streaming
Field-level streaming instead of token-level.