Prompt caching
Prompt caching lets you reuse expensive prompt prefixes across requests and pay only for the incremental deltas. KushRouter forwards each vendor’s native caching knobs and mirrors their usage metrics so you can track savings in real time.
OpenAI prompt caching
- Cached tokens surfaced with
usage.prompt_tokens_details.cached_tokensand in streaming whenstream_options.include_usage=true. - For Chat Completions, client-provided
prompt_cache_keyis not forwarded; scoping is enforced internally. - For Responses, opt into retention with
prompt_cache_retention: "24h"when supported.
Anthropic prompt caching
- Use
cache_controlobjects on blocks and inspectcache_creation_input_tokensandcache_read_input_tokensin responses and streams.
Best practices
- Promote static prefixes, scope keys strategically, monitor usage fields, and warm caches before peak load.