OpenAI Responses (Streaming and Non‑Streaming)
Endpoint: POST /api/openai/v1/responses
Use this endpoint for the OpenAI "Responses" API style. It supports both non‑streaming JSON responses and SSE streaming with official Responses events. Authentication is via Authorization: Bearer $API_KEY.
Note: The
gpt-5-codexmodel is supported via the Responses API only. Usemodel: "gpt-5-codex"with this endpoint.
Note: You can also call Claude models via this OpenAI Responses surface; the request must follow the OpenAI Responses syntax. See Cross‑provider compatibility.
Request schema
Required:
model(string)
One of the supported OpenAI-compatible model ids.
Input options (choose one of the following ways to provide input):
input(string | array | object withcontent) – direct content. If string, it's treated as a user message. If array, it's an array of content parts.messages(array) – OpenAI Chat-style messages. Content arrays are normalized (e.g.,input_textcoerced totext).instructions(string) – optional system/instructional text. When combined withinput, it is sent as a system message.system(string) – optional system prompt (alternative toinstructions).
Tools:
tools(array) – OpenAI function tools list.tool_choice(string | object) – e.g.,"auto", or{ type: "function", function: { name: "..." } }.
Generation controls:
temperature(number) – 0..2top_p(number) – 0..1max_output_tokens(integer) – maximum tokens for outputresponse_format(object) –{ type: "text" | "json_object" | "json_schema", json_schema? }reasoning_effort(string) –low | medium | high(where applicable)reasoning_summary(string) – alias forreasoning.summary. Accepted values depend on the model (see "Supported values by model" below). Common values includeauto,concise,detailed, and for some GPT‑5 modelsnone.mcp_servers(array) – MCP server configurations, if usedstream(boolean) – whentrue, returns an SSE stream of official Responses events
Additional OpenAI-compatible parameters supported by KushRouter (passed through when applicable):
web_search_options(object) – enable web search features for models that support itprompt_cache_retention("24h") – opt-in retention for provider-side prompt cache (Responses API)include(string | string[]) – request additional data in events (see Include flags below)previous_response_id(string) – continue a prior Responses threadtool_outputs(array) – submit outputs after arequires_actionresponsetruncation(object) – output truncation controlsmax_tool_calls(number) – hard cap on parallel tool invocationsaudio,modalities,prediction– multimodal controls where supportedstrict,verbosity– extra validation/diagnostic knobsmetadata,store– provider metadata and storage hints where supportedconversation(string) – conversation/session identifier
Notes:
- Unknown top-level parameters are rejected with HTTP 400.
- Payloads larger than allowed size return HTTP 413.
Reasoning summaries
Set a reasoning summary mode to receive thinking/summarization events for GPT‑5 family models.
- Send the shortcut field
"reasoning_summary": "auto" | "concise" | "detailed" | "none"alongside the other request parameters. - Alternatively, embed it inside the
reasoningobject:"reasoning": { "effort": "high", "summary": "auto" }. - The value
noneexplicitly disables provider summaries (other values require GPT‑5 models).
Supported values by model
KushRouter validates reasoning.summary/reasoning_summary up‑front and returns HTTP 400 for unsupported values with a descriptive error. Support varies by model.
Streaming behavior
When summaries are enabled and the model supports them, the SSE stream may emit:
response.reasoning_summary_part.addedresponse.reasoning_summary_part.doneresponse.reasoning_summary_text.deltaresponse.reasoning_summary_text.done
These deltas accumulate into the final reasoning.summaries[*].text payload.
Final response payload
The terminal response.completed (or status: "incomplete") may include a reasoning object. If no summary content was generated the array may be empty.
Non‑streaming response shape
When stream: false (default), the response is a JSON object. If tools are invoked, you may receive a JSON with status: "requires_action" and required_action.submit_tool_outputs.tool_calls to execute.
Include flags
Use the include parameter to request extra data in streaming events or final payloads. See examples in this page.
See also: OpenAI Responses events
Streaming (SSE) events
When stream: true, the connection emits official Responses events only.
response.createdresponse.in_progressresponse.output_item.addedresponse.content_part.addedresponse.output_text.deltaresponse.function_call_arguments.deltaresponse.function_call_arguments.doneresponse.output_text.doneresponse.content_part.doneresponse.output_item.doneresponse.requires_actionORresponse.completed
Additional terminal signals:
response.incomplete– when output was cut due tomax_output_tokensor similar budgetresponse.failed– emitted on stream failure; followed byresponse.completedwithstatus: "failed"
The final event carries the response with status completed or requires_action and may include a usage object.
Post-stream usage
After a stream ends, fetch token usage and metadata via the generation ID (response.id).