OpenAI Responses (Streaming and Non‑Streaming)

Endpoint: POST /api/openai/v1/responses

Use this endpoint for the OpenAI "Responses" API style. It supports both non‑streaming JSON responses and SSE streaming with official Responses events. Authentication is via Authorization: Bearer $API_KEY.

Note: The gpt-5-codex model is supported via the Responses API only. Use model: "gpt-5-codex" with this endpoint.

Note: You can also call Claude models via this OpenAI Responses surface; the request must follow the OpenAI Responses syntax. See Cross‑provider compatibility.

Request schema

Required:

model (string)

One of the supported OpenAI-compatible model ids.

Input options (choose one of the following ways to provide input):

input (string | array | object with content) – direct content. If string, it's treated as a user message. If array, it's an array of content parts.
messages (array) – OpenAI Chat-style messages. Content arrays are normalized (e.g., input_text coerced to text).
instructions (string) – optional system/instructional text. When combined with input, it is sent as a system message.
system (string) – optional system prompt (alternative to instructions).

Tools:

tools (array) – OpenAI function tools list.
tool_choice (string | object) – e.g., "auto", or { type: "function", function: { name: "..." } }.

Generation controls:

temperature (number) – 0..2
top_p (number) – 0..1
max_output_tokens (integer) – maximum tokens for output
response_format (object) – { type: "text" | "json_object" | "json_schema", json_schema? }
reasoning_effort (string) – low | medium | high (where applicable)
reasoning_summary (string) – alias for reasoning.summary. Accepted values depend on the model (see "Supported values by model" below). Common values include auto, concise, detailed, and for some GPT‑5 models none.
mcp_servers (array) – MCP server configurations, if used
stream (boolean) – when true, returns an SSE stream of official Responses events

Additional OpenAI-compatible parameters supported by KushRouter (passed through when applicable):

web_search_options (object) – enable web search features for models that support it
prompt_cache_retention ("24h") – opt-in retention for provider-side prompt cache (Responses API)
include (string | string[]) – request additional data in events (see Include flags below)
previous_response_id (string) – continue a prior Responses thread
tool_outputs (array) – submit outputs after a requires_action response
truncation (object) – output truncation controls
max_tool_calls (number) – hard cap on parallel tool invocations
audio, modalities, prediction – multimodal controls where supported
strict, verbosity – extra validation/diagnostic knobs
metadata, store – provider metadata and storage hints where supported
conversation (string) – conversation/session identifier

Notes:

Unknown top-level parameters are rejected with HTTP 400.
Payloads larger than allowed size return HTTP 413.

Reasoning summaries

Set a reasoning summary mode to receive thinking/summarization events for GPT‑5 family models.

Send the shortcut field "reasoning_summary": "auto" | "concise" | "detailed" | "none" alongside the other request parameters.
Alternatively, embed it inside the reasoning object: "reasoning": { "effort": "high", "summary": "auto" }.
The value none explicitly disables provider summaries (other values require GPT‑5 models).

Supported values by model

KushRouter validates reasoning.summary/reasoning_summary up‑front and returns HTTP 400 for unsupported values with a descriptive error. Support varies by model.

Streaming behavior

When summaries are enabled and the model supports them, the SSE stream may emit:

response.reasoning_summary_part.added
response.reasoning_summary_part.done
response.reasoning_summary_text.delta
response.reasoning_summary_text.done

These deltas accumulate into the final reasoning.summaries[*].text payload.

Final response payload

The terminal response.completed (or status: "incomplete") may include a reasoning object. If no summary content was generated the array may be empty.

Non‑streaming response shape

When stream: false (default), the response is a JSON object. If tools are invoked, you may receive a JSON with status: "requires_action" and required_action.submit_tool_outputs.tool_calls to execute.

Include flags

Use the include parameter to request extra data in streaming events or final payloads. See examples in this page.

Streaming (SSE) events

When stream: true, the connection emits official Responses events only.

response.created
response.in_progress
response.output_item.added
response.content_part.added
response.output_text.delta
response.function_call_arguments.delta
response.function_call_arguments.done
response.output_text.done
response.content_part.done
response.output_item.done
response.requires_action OR response.completed

Additional terminal signals:

response.incomplete – when output was cut due to max_output_tokens or similar budget
response.failed – emitted on stream failure; followed by response.completed with status: "failed"

The final event carries the response with status completed or requires_action and may include a usage object.

Post-stream usage

After a stream ends, fetch token usage and metadata via the generation ID (response.id).