Verbosity Decreases Accuracy in Large Language Models

reasoning

2026-04-06 | Source: Unite.AI | Original article

Researchers at the University of Copenhagen have published a study showing that large language models (LLMs) become more accurate when they are forced to keep answers short. The team measured performance across a suite of reasoning and factual‑recall benchmarks, comparing standard prompting with a “concise‑only” constraint that caps output length. Across models ranging from 7 billion to 70 billion parameters, the concise setting reduced factual errors by up to 12 percentage points and improved reasoning scores on chain‑of‑thought tasks. The authors label the phenomenon “Verbosity Compensation” (VC), arguing that models allocate part of their capacity to generating elaborate prose at the expense of logical precision. The finding matters because it challenges the prevailing assumption that longer, more detailed responses are inherently better. Current instruction‑tuning pipelines often reward verbosity, and commercial APIs charge by token, incentivising longer outputs. If brevity yields higher fidelity, developers may need to rethink prompting strategies, evaluation metrics, and even model architecture. Shorter answers also cut computational cost and latency, a practical win for real‑time applications such as chat assistants and search augmentation. What to watch next is how the industry reacts. Prompt‑engineering guides are likely to incorporate length limits, and major providers may roll out “concise‑mode” switches for their APIs. Researchers are already exploring fine‑tuning techniques that internalise VC, while model builders such as Mistral AI, whose LongCoT variant is explicitly trained for extended discourse, may release trimmed counterparts. Follow‑up studies will test whether the effect holds for multimodal models and for tasks that genuinely require long‑form generation, such as report writing or creative storytelling. The debate over optimal answer length is set to become a new front in the race for trustworthy, efficient AI.

Sources

Back to AIPULSEN