Openai Streaming

OpenAI-compatible streaming

Endpoint: POST /api/openai/v1/chat/completions

This endpoint implements the official OpenAI Chat Completions streaming format. When stream: true, it returns a Server-Sent Events (SSE) stream of chat.completion.chunk messages followed by a final [DONE].

Request schema

  • model (string) – required
  • messages (array) – required, OpenAI-style messages
  • stream (boolean) – optional, must be boolean
  • temperature (number 0..2), max_tokens (int), max_completion_tokens (int)
  • top_p (0..1], stop (string|string[])
  • frequency_penalty (-2..2), presence_penalty (-2..2)
  • response_format (object)
    • type: text | json_object | json_schema
    • If type: 'json_schema', provide json_schema: { name: string; schema?: object; strict?: boolean }
  • Tools & legacy function calling: tools, tool_choice, function_call, functions
  • Additional compatibility: n, seed, user, logit_bias, parallel_tool_calls, service_tier, store, metadata
  • stream_options (object) – { include_usage?: boolean }. When set, a final chunk includes usage totals.

Invalid combinations will be rejected with HTTP 400 and a descriptive message.

cURL (streaming)

curl -N -X POST "https://api.kushrouter.com/api/openai/v1/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role":"user","content":"Write a limerick about routers"}],
    "stream": true
  }'

Streaming frames

Each frame is an object like:

{
  "id": "chatcmpl-...",
  "object": "chat.completion.chunk",
  "created": 1739123456,
  "model": "gpt-4o-mini",
  "choices": [
    { "index": 0, "delta": { "content": "..." }, "finish_reason": null }
  ]
}

Tool calls stream as incremental function arguments under choices[0].delta.tool_calls[].function.arguments. If a provider only supplies final arguments at stop, a final delta is emitted containing the complete arguments. The last frame sets finish_reason to tool_calls or stop.

Errors

  • 400 – invalid JSON or unknown/unsupported parameters
  • 401 – missing or invalid API key
  • 413 – payload too large
  • 429 – rate limit exceeded (key or IP)
  • 5xx – transient errors