Experimental hybrid inference and new Gemini models for Android

gemini google inference

2026-04-20 | Source: Mastodon | Original article

Google has unveiled an experimental “hybrid inference” API for Android that lets developers blend on‑device and cloud‑based Gemini models through a single Firebase interface. The new Gemini‑Nano model runs locally via ML Kit’s Prompt API, while larger Gemini variants continue to execute in the cloud. A rule‑based router decides, in real time, which portion of a request stays on the phone and which is offloaded, promising faster responses, lower latency and stronger privacy for tasks such as single‑turn text generation from short prompts or single‑image inputs. The move matters because Android’s fragmented hardware landscape has long forced developers to choose between the speed and offline capability of tiny on‑device models and the richer capabilities of server‑side LLMs. By exposing a unified API, Google aims to make “on‑device + cloud” the default architecture, reducing the need for separate code paths and enabling smarter trade‑offs based on network conditions, battery state or user‑privacy preferences. The announcement follows last week’s Gemini performance surge, where the model out‑scored ChatGPT on the Implicator LLM Meter, and signals Google’s intent to embed its flagship generative AI deeper into the mobile ecosystem. What to watch next: Google says the hybrid routing logic will evolve from the current simple rule set to a learned, context‑aware scheduler that can dynamically balance cost, latency and data sensitivity. Developers can already experiment with the Firebase Hybrid SDK and a sample app that generates hotel reviews from user‑selected topics. Expect broader model availability—beyond the current text‑only and single‑image use cases—and tighter integration with Android 15’s privacy sandbox, which could make hybrid inference the backbone of next‑generation mobile AI experiences.

Sources

Back to AIPULSEN