Prompt caching

Prompt caching lets you reuse expensive prompt prefixes across requests and pay only for the incremental deltas. KushRouter forwards each vendor’s native caching knobs and mirrors their usage metrics so you can track savings in real time.

OpenAI prompt caching

Cached tokens surfaced with usage.prompt_tokens_details.cached_tokens and in streaming when stream_options.include_usage=true.
For Chat Completions, client-provided prompt_cache_key is not forwarded; scoping is enforced internally.
For Responses, opt into retention with prompt_cache_retention: "24h" when supported.

Anthropic prompt caching

Use cache_control objects on blocks and inspect cache_creation_input_tokens and cache_read_input_tokens in responses and streams.

Best practices

Promote static prefixes, scope keys strategically, monitor usage fields, and warm caches before peak load.

Prompt caching

OpenAI prompt caching

Anthropic prompt caching

Best practices

See also