Unified Endpoint

Unified endpoint (OpenAI + Anthropic)

Endpoint: POST /api/v1/messages

Use this endpoint for a provider-agnostic request shape that supports both OpenAI and Anthropic models. It accepts a familiar messages array and streams OpenAI-compatible chunks when stream: true.

Request schema

Required fields:

  • model (string) – one of the allowed model IDs configured by the service.
  • messages (array) – conversation messages. Use OpenAI-style messages for OpenAI models or Anthropic-style content blocks for Anthropic models.

Optional fields (OpenAI-style):

  • temperature (0..2), max_tokens (int), top_p (0..1], stop (string|string[])
  • frequency_penalty (-2..2), presence_penalty (-2..2)
  • response_format – supports { type: 'text' | 'json_object' | 'json_schema', json_schema?: { name: string; schema?: object; strict?: boolean } }
  • tools, tool_choice, legacy function_call/functions
  • n, seed, user, logit_bias, parallel_tool_calls, store

Optional fields (Anthropic-style):

  • top_k (int > 0), stop_sequences (string[])
  • metadata (object), thinking (object), cache_control (object)

Common:

  • stream (boolean) – must be a boolean. When true, response is SSE with OpenAI chat.completion.chunk frames.
  • mcp_servers (array) – optional list of MCP server definitions to enhance tool calling.
  • prompt_cache (object) – optional hinting for prompt caching.
  • timeoutMs (number) – optional per-request timeout.
  • input_file_id (string) – optional file ID to load a JSONL request from your uploaded files (first valid request will be used; body fields take precedence).

Note: The unified endpoint enforces provider-parameter compatibility. If you choose an OpenAI model, Anthropic-only params (like top_k, stop_sequences, etc.) will be rejected with 400; and vice versa.

Messages format

  • For OpenAI models, send OpenAI-style messages ({ role, content }) where content may be a string or an array with parts like text, image_url, etc.
  • For Anthropic models, send Anthropic-style content blocks (e.g., text, tool_use, tool_result).

File input

If you provide input_file_id, the server reads the referenced file (uploaded via Files API) which should contain newline-delimited JSON (JSONL) requests. The first valid request is merged with the request body, with body properties taking precedence.

cURL example (streaming)

curl -N -X POST "https://api.kushrouter.com/api/v1/messages" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "Write a haiku about routers" }
    ],
    "stream": true
  }'

Response is an SSE stream of OpenAI-style chat.completion.chunk objects ending with [DONE].

Error responses

Validation failures return HTTP 400 with a descriptive message, for example:

{"error": "Parameters not supported for Anthropic models: response_format, frequency_penalty"}

Other errors:

  • 401 – missing/invalid API key
  • 413 – payload too large
  • 429 – rate limit exceeded (key or IP)

Tool calls

When tools are used, incremental function call deltas are streamed via choices[0].delta.tool_calls[].function.arguments (OpenAI format). If the provider only supplies final arguments at the end, a final tool-call delta with complete arguments is emitted. The final chunk sets finish_reason to tool_calls when applicable.

Response shapes

Non‑streaming JSON response follows OpenAI Chat Completion structure. Streaming uses OpenAI chat.completion.chunk frames with:

{
  "id": "chatcmpl-...",
  "object": "chat.completion.chunk",
  "created": 1739123456,
  "model": "...",
  "choices": [
    { "index": 0, "delta": { "content": "..." }, "finish_reason": null }
  ]
}

A final frame with empty delta sets finish_reason to stop or tool_calls.