Gemma 4 After 24 Hours: What the Community Found vs What Google Promised

benchmarks gemma google open-source

2026-04-03 | Source: Dev.to | Original article

Google’s latest open‑source model, Gemma 4, hit the community 24 hours ago with a splash of hype: a 6 billion‑parameter transformer, Apache 2.0‑licensed, and benchmark scores that, on paper, outpace most contemporaries in reasoning, coding and multilingual tasks. As we reported on April 3, the release was positioned as a “ChatGPT‑like” experience that anyone could run on a laptop. Early adopters on Reddit, Hacker News and GitHub have now posted real‑world results that both confirm and temper Google’s claims. On commodity hardware – a 2022‑era MacBook Air with an M2 chip – the 6 GB variant runs at roughly 2 tokens per second, far slower than the advertised “interactive latency”. On a modest 4‑GPU server, inference speeds approach the promised range, but memory‑footprint quirks force users to trim context windows. The community also uncovered a mismatch between the published benchmark suite (MMLU, HumanEval) and the model’s actual performance on open‑source evaluation tools such as lm‑eval‑harness, where Gemma 4 trails Llama 3.1 on code generation and falls short on complex reasoning. Why it matters is twofold. First, the permissive license lowers the barrier for startups and research labs in the Nordics to embed a powerful LLM without royalty entanglements, potentially reshaping the regional AI ecosystem. Second, the gap between headline numbers and on‑device reality highlights the lingering trade‑off between openness and engineering polish that Google must resolve to compete with Anthropic’s Claude or Meta’s Llama 4. Looking ahead, the next week will reveal whether Google will issue a performance‑tuned patch or a larger‑parameter variant, and how quickly the community will contribute optimised kernels for ARM and RISC‑V platforms. Watch for announcements on fine‑tuning pipelines, integration with Vertex AI, and any clarification from Google on the benchmark methodology that sparked the initial buzz.

Sources

Back to AIPULSEN