Concepts
Prompt caching

Prompt caching

Prompt caching lets you reuse expensive prompt prefixes across requests and pay only for the incremental deltas. KushRouter forwards each vendor’s native caching knobs and mirrors their usage metrics so you can track savings in real time.

OpenAI prompt caching

  • Cached tokens surfaced with usage.prompt_tokens_details.cached_tokens and in streaming when stream_options.include_usage=true.
  • For Chat Completions, client-provided prompt_cache_key is not forwarded; scoping is enforced internally.
  • For Responses, opt into retention with prompt_cache_retention: "24h" when supported.

Anthropic prompt caching

  • Use cache_control objects on blocks and inspect cache_creation_input_tokens and cache_read_input_tokens in responses and streams.

Best practices

  • Promote static prefixes, scope keys strategically, monitor usage fields, and warm caches before peak load.

See also