Anthropic Batch API Workflow Sees Zero Cache Hits After Adding Prompt Caching
anthropic
| Source: Mastodon | Original article
Anthropic Batch API's prompt caching feature fails to deliver, with a 0% hit rate.
As we reported on April 29, Anthropic has been making waves with its Champion Kit and Claude Code features. Now, a developer has shared their experience with adding prompt caching to their Anthropic Batch API workflow, only to find a 0% hit rate. The issue lies in the minimum cacheable token count for each model, which is 4,096 for Haiku 4.5. If the cache control block is below this threshold, the API silently ignores it, resulting in zero cache reads and no warning.
This discovery matters because prompt caching can significantly reduce API costs, with some users reporting savings of up to 90% on input tokens after the first loop. Anthropic's prompt caching is designed to optimize workloads with long, repeated system prompts, making it a crucial feature for developers looking to cut costs. The fact that the Batch API is a "completely different beast" suggests that developers will need to adapt their caching strategies to get the most out of Anthropic's features.
Moving forward, developers will need to carefully consider the minimum cacheable token count for each model when implementing prompt caching in their Anthropic Batch API workflows. As Anthropic continues to evolve its features and pricing, it will be essential to monitor updates and best practices for optimizing API costs. With the potential for significant cost savings, developers will be watching closely to see how Anthropic addresses the limitations of its prompt caching feature.
Sources
Back to AIPULSEN