OpenAI-compatible streaming
Endpoint: POST /api/openai/v1/chat/completions
This endpoint implements the official OpenAI Chat Completions streaming format. When stream: true, it returns a Server-Sent Events (SSE) stream of chat.completion.chunk messages followed by a final [DONE].
Note: You can call Claude models through this OpenAI-compatible surface; the request must still follow the OpenAI Chat format. See Cross‑provider compatibility.
Request schema
model(string) – requiredmessages(array) – required, OpenAI-style messagesstream(boolean) – optional, must be booleantemperature(number 0..2),max_tokens(int),max_completion_tokens(int)top_p(0..1],stop(string|string[])frequency_penalty(-2..2),presence_penalty(-2..2)response_format(object)type:text|json_object|json_schema- If
type: 'json_schema', providejson_schema: { name: string; schema?: object; strict?: boolean }
reasoning_effort(string) –low | medium | high(where applicable)- Tools & legacy function calling:
tools,tool_choice,function_call,functions - Additional compatibility:
n,seed,user,logit_bias,parallel_tool_calls,service_tier,store,metadata stream_options(object) –{ include_usage?: boolean }. When set, a final chunk includesusagetotals.web_search_options(object) – forwarded for models that support retrieval/web features
Notes
- The router does not forward client-provided
prompt_cache_keyon Chat Completions; prompt-cache scoping is enforced internally. For prompt cache retention controls on the OpenAI Responses API, seeprompt_cache_retentionin OpenAI Responses.
Invalid combinations will be rejected with HTTP 400 and a descriptive message.
cURL (streaming)
curl -N -X POST "https://api.kushrouter.com/api/openai/v1/chat/completions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5-mini-2025-08-07",
"messages": [{"role":"user","content":"Write a limerick about routers"}],
"stream": true
}'Streaming frames
Each frame is an object like:
{
"id": "chatcmpl-...",
"object": "chat.completion.chunk",
"created": 1739123456,
"model": "gpt-5-mini-2025-08-07",
"choices": [
{ "index": 0, "delta": { "content": "..." }, "finish_reason": null }
]
}Tool calls stream as incremental function arguments under choices[0].delta.tool_calls[].function.arguments. If a provider only supplies final arguments at stop, a final delta is emitted containing the complete arguments. The last frame sets finish_reason to tool_calls or stop.
Final finish_reason and usage-in-stream
When streaming completes, a final chunk includes a choices[0].delta with no content and a finish_reason (e.g., "stop"), followed by [DONE]. If you set stream_options.include_usage: true, we send a usage-only chunk just before [DONE].
Example transcript (abbreviated):
data: {"id":"chatcmpl_...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl_...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
// Final textual chunk indicating finish_reason
data: {"id":"chatcmpl_...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
// Optional usage-only chunk when stream_options.include_usage=true
data: {"id":"chatcmpl_...","object":"chat.completion.chunk","usage":{"prompt_tokens":12,"completion_tokens":10,"total_tokens":22}}
data: [DONE]Errors
- 400 – invalid JSON or unknown/unsupported parameters
- 401 – missing or invalid API key
- 413 – payload too large
- 429 – rate limit exceeded (key or IP)
- 5xx – transient errors
Post-stream usage
After a stream finishes, you can fetch token usage and metadata by generation ID (the id from stream chunks):
const id = 'chatcmpl-...';
const res = await fetch('https://api.kushrouter.com/api/v1/generations?id=' + encodeURIComponent(id), {
headers: { Authorization: `Bearer ${process.env.KUSHROUTER_API_KEY}` }
});
const { generation } = await res.json();
console.log(generation.usage);Reasoning details
- When supported, you can request additional computational effort using
reasoning_effort. - In both streaming and non‑streaming modes, the
usage.completion_tokens_detailsmay includereasoning_tokens.
Model support notes
- GPT‑5 family: supports
reasoning_effort; reasoning summaries are available via the OpenAI Responses API. - O‑series (
o3,o4-mini): support reasoning summary modes.
To request reasoning summaries, prefer the OpenAI Responses endpoint. See OpenAI Responses.