OpenAI Responses (Streaming and Non‑Streaming)

Endpoint: POST /api/openai/v1/responses

Use this endpoint for the OpenAI "Responses" API style. It supports both non‑streaming JSON responses and SSE streaming with official Responses events. Authentication is via Authorization: Bearer $API_KEY.

Request schema

Required:

model (string)

One of the supported OpenAI-compatible model ids.

Input options (choose one of the following ways to provide input):

input (string | array | object with content) – direct content. If string, it's treated as a user message. If array, it's an array of content parts.
messages (array) – OpenAI Chat-style messages. Content arrays are normalized (e.g., input_text coerced to text).
instructions (string) – optional system/instructional text. When combined with input, it is sent as a system message.
system (string) – optional system prompt (alternative to instructions).

Tools:

tools (array) – OpenAI function tools list.
tool_choice (string | object) – e.g., "auto", or { type: "function", function: { name: "..." } }.

Generation controls:

temperature (number) – 0..2
top_p (number) – 0..1
max_output_tokens (integer) – maximum tokens for output
response_format (object) – { type: "text" | "json_object" | "json_schema", json_schema? }
reasoning_effort (string) – low | medium | high (where applicable)
mcp_servers (array) – MCP server configurations, if used
stream (boolean) – when true, returns an SSE stream of official Responses events

Notes:

Unknown top-level parameters are rejected with HTTP 400.
Payloads larger than allowed size return HTTP 413.

Non‑streaming response shape

When stream: false (default), the response is a JSON object. If tools are invoked, you may receive a JSON with status: "requires_action" and required_action.submit_tool_outputs.tool_calls to execute.

Example (completed without tools):

{
  "id": "resp_...",
  "object": "response",
  "model": "openai:gpt-4o-2024-11-20",
  "created_at": 1739123456,
  "status": "completed",
  "output": [{ "type": "output_text", "text": "Hello!" }],
  "output_text": "Hello!",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 10,
    "total_tokens": 22
  }
}

Example (tools required):

{
  "id": "resp_...",
  "object": "response",
  "model": "openai:gpt-4o-2024-11-20",
  "created_at": 1739123456,
  "status": "requires_action",
  "required_action": {
    "type": "submit_tool_outputs",
    "submit_tool_outputs": {
      "tool_calls": [
        {
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\n  \"city\": \"Boston\"\n}"
          }
        }
      ]
    }
  },
  "usage": {
    "input_tokens": 20,
    "output_tokens": 5,
    "total_tokens": 25
  }
}

Streaming (SSE) events

When stream: true, the connection emits official Responses events only. Expect the following sequence as applicable:

response.created
response.in_progress
response.output_item.added (for a message item)
response.content_part.added (initial output_text part)
response.output_text.delta (emitted repeatedly with text deltas)
response.function_call_arguments.delta (during tool arg streaming)
response.function_call_arguments.done (final tool args)
response.output_text.done (final text)
response.content_part.done
response.output_item.done
response.requires_action (if tools were called) OR response.completed (when finished)

The final event carries the response with status completed or requires_action and may include a usage object containing token counts and, when available, provider usage details.

Example curl:

curl -N https://api.kushrouter.com/api/openai/v1/responses \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai:gpt-4o-2024-11-20",
    "input": "Say hello",
    "stream": true
  }'

Errors

400 – invalid JSON or unknown/invalid parameters
401 – missing/invalid API key
413 – payload too large
429 – rate limit exceeded (may be returned before a stream starts)
500 – internal error

Tool calling notes

Tool calls stream incrementally as response.function_call_arguments.delta and finalize with response.function_call_arguments.done.
When a provider only supplies final arguments at the end, you still receive a final ...arguments.done event with complete arguments.

Models Openai Streaming