GitHub - JuliusBrussee/caveman: 🪨 why use many token when few token do trick — Claude Code skill that cuts 75% of tokens by talking like caveman

claude

2026-04-05 | Source: Mastodon | Original article

A GitHub user, Julius Brussee, has released a community‑built “Caveman” skill for Anthropic’s Claude that rewrites prompts and responses in a stripped‑down, primitive style, slashing the number of output tokens by roughly 75 %. The repository, titled *caveman* and posted just 18 hours ago, hooks into Claude’s Code skill API and forces the model to adopt a “caveman‑speak” grammar – short, predictable phrases that convey the same logical content with far fewer words. A parallel project, *caveman‑compression* by wilpel, describes the same principle as a semantic compression method that removes predictable grammar while preserving factual meaning. Why it matters is twofold. First, token consumption directly drives cost and latency for LLM‑powered services; a 75 % reduction can translate into noticeable savings for developers who run Claude at scale. Second, the technique touches a broader debate about context windows that we explored in our April 5 piece, “The AI Context Window Trap: Why More Context Makes Your System Worse.” By trimming output tokens, the Caveman skill effectively stretches the usable portion of Claude’s context window, allowing more of the original prompt to stay in memory without hitting the model’s limit. The community response is already mixed. A Reddit thread on r/ClaudeAI celebrates the “Kevin Malone” or “Grug‑brained developer” protocol as a clever hack, while more technical users warn that the compression only affects Claude’s output, leaving input tokens untouched, and that the resulting text can be harder to read, debug, or audit. What to watch next: Anthropic may consider integrating user‑generated compression tricks into its official toolset, or at least provide clearer guidance on custom skills. Competitors such as OpenAI and Google are likely to experiment with similar semantic compression layers, and academic research on token‑efficient prompting could soon move from novelty to standard practice. Keep an eye on any official statements from Anthropic and on follow‑up repositories that aim to preserve readability while retaining the token savings.

Sources

Back to AIPULSEN