A la industria de la inteligencia artificial le encanta la inflación de tokens. A tu empresa no debería…
| Source: Mastodon | Original article
A new study titled “Lost in the Middle” upends a long‑standing assumption in enterprise AI: that feeding a language model ever more context will inevitably improve its output. The paper, authored by researchers from Stanford and DeepMind and posted on arXiv this week, demonstrates that beyond a modest window of roughly 1,000 tokens, additional context not only yields diminishing returns but can actively degrade performance on tasks ranging from document summarisation to code completion. The authors trace the effect to “token inflation” – a runaway increase in the number of tokens processed without a commensurate gain in signal, which inflates compute costs and latency.
The findings matter because most commercial LLM services price usage per token. Enterprises that indiscriminately prepend large knowledge bases or conversation histories to prompts may be paying for wasted compute while seeing no quality boost. In a market where AI‑driven SaaS products are already under pressure from the Nasdaq correction we covered on April 10, the cost inefficiency highlighted by the study could tighten profit margins for firms that rely heavily on OpenAI, Anthropic or Cohere APIs. Moreover, the environmental impact of unnecessary token processing adds a sustainability dimension to the business case for more disciplined prompting.
What to watch next is how AI platform providers respond. OpenAI, for instance, has begun experimenting with “context‑window pricing” that discounts tokens beyond a certain length, while Anthropic is promoting retrieval‑augmented generation as a way to keep prompts lean. Companies are likely to adopt new prompt‑engineering best practices, such as dynamic chunking and selective retrieval, and to explore emerging token‑efficient architectures like LongLoRA and FlashAttention. Follow‑up research from the same groups is expected later this year, potentially shaping industry standards for cost‑effective, high‑quality AI deployment.
Sources
Back to AIPULSEN