API reference
OpenAI Responses (/api/openai/v1/responses)

OpenAI Responses (Streaming and Non‑Streaming)

Endpoint: POST /api/openai/v1/responses

Use this endpoint for the OpenAI "Responses" API style. It supports both non‑streaming JSON responses and SSE streaming with official Responses events. Authentication is via Authorization: Bearer $API_KEY.

Note: The gpt-5-codex model is supported via the Responses API only. Use model: "gpt-5-codex" with this endpoint.

Note: You can also call Claude models via this OpenAI Responses surface; the request must follow the OpenAI Responses syntax. See Cross‑provider compatibility.

Request schema

Required:

  • model (string)

One of the supported OpenAI-compatible model ids.

Input options (choose one of the following ways to provide input):

  • input (string | array | object with content) – direct content. If string, it's treated as a user message. If array, it's an array of content parts.
  • messages (array) – OpenAI Chat-style messages. Content arrays are normalized (e.g., input_text coerced to text).
  • instructions (string) – optional system/instructional text. When combined with input, it is sent as a system message.
  • system (string) – optional system prompt (alternative to instructions).

Tools:

  • tools (array) – OpenAI function tools list.
  • tool_choice (string | object) – e.g., "auto", or { type: "function", function: { name: "..." } }.

Generation controls:

  • temperature (number) – 0..2
  • top_p (number) – 0..1
  • max_output_tokens (integer) – maximum tokens for output
  • response_format (object) – { type: "text" | "json_object" | "json_schema", json_schema? }
  • reasoning_effort (string) – low | medium | high (where applicable)
  • reasoning_summary (string) – alias for reasoning.summary. Accepted values depend on the model (see "Supported values by model" below). Common values include auto, concise, detailed, and for some GPT‑5 models none.
  • mcp_servers (array) – MCP server configurations, if used
  • stream (boolean) – when true, returns an SSE stream of official Responses events

Additional OpenAI-compatible parameters supported by KushRouter (passed through when applicable):

  • web_search_options (object) – enable web search features for models that support it
  • prompt_cache_retention ("24h") – opt-in retention for provider-side prompt cache (Responses API)
  • include (string | string[]) – request additional data in events (see Include flags below)
  • previous_response_id (string) – continue a prior Responses thread
  • tool_outputs (array) – submit outputs after a requires_action response
  • truncation (object) – output truncation controls
  • max_tool_calls (number) – hard cap on parallel tool invocations
  • audio, modalities, prediction – multimodal controls where supported
  • strict, verbosity – extra validation/diagnostic knobs
  • metadata, store – provider metadata and storage hints where supported
  • conversation (string) – conversation/session identifier

Notes:

  • Unknown top-level parameters are rejected with HTTP 400.
  • Payloads larger than allowed size return HTTP 413.

Reasoning summaries

Set a reasoning summary mode to receive thinking/summarization events for GPT‑5 family models.

  • Send the shortcut field "reasoning_summary": "auto" | "concise" | "detailed" | "none" alongside the other request parameters.
  • Alternatively, embed it inside the reasoning object: "reasoning": { "effort": "high", "summary": "auto" }.
  • The value none explicitly disables provider summaries (other values require GPT‑5 models).

Supported values by model

KushRouter validates reasoning.summary/reasoning_summary up‑front and returns HTTP 400 for unsupported values with a descriptive error. Support varies by model.

Streaming behavior

When summaries are enabled and the model supports them, the SSE stream may emit:

  • response.reasoning_summary_part.added
  • response.reasoning_summary_part.done
  • response.reasoning_summary_text.delta
  • response.reasoning_summary_text.done

These deltas accumulate into the final reasoning.summaries[*].text payload.

Final response payload

The terminal response.completed (or status: "incomplete") may include a reasoning object. If no summary content was generated the array may be empty.

Non‑streaming response shape

When stream: false (default), the response is a JSON object. If tools are invoked, you may receive a JSON with status: "requires_action" and required_action.submit_tool_outputs.tool_calls to execute.

Include flags

Use the include parameter to request extra data in streaming events or final payloads. See examples in this page.

See also: OpenAI Responses events

Streaming (SSE) events

When stream: true, the connection emits official Responses events only.

  • response.created
  • response.in_progress
  • response.output_item.added
  • response.content_part.added
  • response.output_text.delta
  • response.function_call_arguments.delta
  • response.function_call_arguments.done
  • response.output_text.done
  • response.content_part.done
  • response.output_item.done
  • response.requires_action OR response.completed

Additional terminal signals:

  • response.incomplete – when output was cut due to max_output_tokens or similar budget
  • response.failed – emitted on stream failure; followed by response.completed with status: "failed"

The final event carries the response with status completed or requires_action and may include a usage object.

Post-stream usage

After a stream ends, fetch token usage and metadata via the generation ID (response.id).

See also