OPENAI API EMBRACES "PROMPT CACHING" OpenAI API prompt caching, starting March 22, 2026,

openai

2026-04-01 | Source: Mastodon | Original article

OpenAI rolled out “prompt caching” for its API on 22 March 2026, a feature that automatically stores the tokenised representation of any prompt 1 024 tokens or longer and re‑uses it when the same text is sent again. The system routes repeat requests to the server that already processed the prompt, bypassing the full inference step and cutting both compute time and token‑based charges. The move matters because prompt‑heavy workloads—retrieval‑augmented generation, chain‑of‑thought reasoning and multimodal pipelines—often resend identical system or user prompts thousands of times. By caching these static fragments, developers can shave latency by up to 70 % and reduce API bills by a comparable margin, according to OpenAI’s internal benchmarks. The feature also introduces a new `prompt_cache_retention` parameter, letting users choose short‑term (minutes) or longer‑term (hours) storage, a flexibility first hinted at when OpenAI announced the concept in October 2024. Prompt caching arrives alongside other efficiency tools unveiled at OpenAI’s recent DevDay, such as the Realtime API and model distillation, signalling a broader strategy to lower the cost barrier that has accompanied the rapid scaling of large language models. The timing is notable after OpenAI’s $12 billion funding round earlier this month and a spate of copyright lawsuits that have put pressure on the company to demonstrate responsible, cost‑effective deployment. What to watch next: early adopters will publish performance case studies that could reshape pricing expectations for Retrieval‑Augmented Generation services. Competitors are likely to accelerate their own caching solutions—Anthropic already claims 90 % cost cuts—so a wave of feature parity battles may follow. Finally, OpenAI’s pricing sheet will reveal whether cached prompts are billed at a reduced rate, a detail that could tip the economics of large‑scale AI applications in the Nordic market and beyond.

Sources

Back to AIPULSEN