OpenAI-compatible streaming

Endpoint: POST /api/openai/v1/chat/completions

This endpoint implements the official OpenAI Chat Completions streaming format. When stream: true, it returns a Server-Sent Events (SSE) stream of chat.completion.chunk messages followed by a final [DONE].

Note: You can call Claude models through this OpenAI-compatible surface; the request must still follow the OpenAI Chat format. See Cross‑provider compatibility.

Request schema

model (string) – required
messages (array) – required, OpenAI-style messages
stream (boolean) – optional, must be boolean
temperature (number 0..2), max_tokens (int), max_completion_tokens (int)
top_p (0..1], stop (string|string[])
frequency_penalty (-2..2), presence_penalty (-2..2)
response_format (object)
- type: text | json_object | json_schema
- If type: 'json_schema', provide json_schema: { name: string; schema?: object; strict?: boolean }
reasoning_effort (string) – low | medium | high (where applicable)
Tools & legacy function calling: tools, tool_choice, function_call, functions
Additional compatibility: n, seed, user, logit_bias, parallel_tool_calls, service_tier, store, metadata
stream_options (object) – { include_usage?: boolean }. When set, a final chunk includes usage totals.
web_search_options (object) – forwarded for models that support retrieval/web features

Notes

The router does not forward client-provided prompt_cache_key on Chat Completions; prompt-cache scoping is enforced internally. For prompt cache retention controls on the OpenAI Responses API, see prompt_cache_retention in OpenAI Responses.

Invalid combinations will be rejected with HTTP 400 and a descriptive message.

cURL (streaming)

curl -N -X POST "https://api.kushrouter.com/api/openai/v1/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-mini-2025-08-07",
    "messages": [{"role":"user","content":"Write a limerick about routers"}],
    "stream": true
  }'

Streaming frames

Each frame is an object like:

{
  "id": "chatcmpl-...",
  "object": "chat.completion.chunk",
  "created": 1739123456,
  "model": "gpt-5-mini-2025-08-07",
  "choices": [
    { "index": 0, "delta": { "content": "..." }, "finish_reason": null }
  ]
}

Tool calls stream as incremental function arguments under choices[0].delta.tool_calls[].function.arguments. If a provider only supplies final arguments at stop, a final delta is emitted containing the complete arguments. The last frame sets finish_reason to tool_calls or stop.

Final finish_reason and usage-in-stream

When streaming completes, a final chunk includes a choices[0].delta with no content and a finish_reason (e.g., "stop"), followed by [DONE]. If you set stream_options.include_usage: true, we send a usage-only chunk just before [DONE].

Example transcript (abbreviated):

data: {"id":"chatcmpl_...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
 
data: {"id":"chatcmpl_...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
 
// Final textual chunk indicating finish_reason
data: {"id":"chatcmpl_...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
 
// Optional usage-only chunk when stream_options.include_usage=true
data: {"id":"chatcmpl_...","object":"chat.completion.chunk","usage":{"prompt_tokens":12,"completion_tokens":10,"total_tokens":22}}
 
data: [DONE]

Errors

400 – invalid JSON or unknown/unsupported parameters
401 – missing or invalid API key
413 – payload too large
429 – rate limit exceeded (key or IP)
5xx – transient errors

Post-stream usage

After a stream finishes, you can fetch token usage and metadata by generation ID (the id from stream chunks):

const id = 'chatcmpl-...';
const res = await fetch('https://api.kushrouter.com/api/v1/generations?id=' + encodeURIComponent(id), {
  headers: { Authorization: `Bearer ${process.env.KUSHROUTER_API_KEY}` }
});
const { generation } = await res.json();
console.log(generation.usage);

Reasoning details

When supported, you can request additional computational effort using reasoning_effort.
In both streaming and non‑streaming modes, the usage.completion_tokens_details may include reasoning_tokens.

Model support notes

GPT‑5 family: supports reasoning_effort; reasoning summaries are available via the OpenAI Responses API.
O‑series (o3, o4-mini): support reasoning summary modes.

To request reasoning summaries, prefer the OpenAI Responses endpoint. See OpenAI Responses.