Google's Gemma 4 brings AI superpowers to your device
deepmind gemma google multimodal openai open-source
| Source: Benzinga on MSN | Original article
Alphabet’s DeepMind unit unveiled Gemma 4 on Thursday, expanding the open‑source Gemma family with four new model sizes that span dense and mixture‑of‑experts (MoE) architectures. All variants are released under the Apache 2.0 licence, support a 256 K‑token context window, and ship with a native “reasoning mode” that enables chain‑of‑thought prompting without external tool calls. The bundle is positioned as a “frontier multimodal” suite that can run on anything from a mobile phone to a data‑center GPU, with the largest 31 B‑parameter MoE model fitting on a single NVIDIA H100.
The launch matters because it lowers the barrier for developers who want high‑performing, multilingual AI without the recurring costs of cloud APIs. Gemma 4 covers more than 140 languages and can be deployed on‑device, a claim that dovetails with our earlier coverage of running Gemma 4 locally via LM Studio’s headless CLI and on iPhone (see our April 6 reports). By keeping inference in‑house, enterprises can cut latency, improve privacy, and avoid the $1,200‑plus monthly waste we recently exposed in API‑driven workflows.
Google pairs the model release with AI Studio, a set of tooling and documentation that lets the community compile Gemma 4 for frameworks such as transformers, llama.cpp, MLX, WebGPU and Rust. Early benchmarks suggest the 26 B‑parameter dense variant rivals proprietary offerings on reasoning tasks, while the MoE version delivers comparable quality with a fraction of the compute footprint.
What to watch next: the first wave of third‑party integrations—particularly in edge‑AI kits for robotics, AR glasses and low‑power servers—will test Gemma 4’s on‑device claims. Performance comparisons with contemporaries like Qwen 3.5 and Llama 3 will shape its standing in the open‑model race, and Google’s roadmap for incremental updates to the reasoning engine could further tighten the gap between open and closed‑source AI.
Sources
Back to AIPULSEN