Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B
gemma gpt-4 openai speech voice
| Source: HN | Original article
A developer on Hacker News has just demonstrated a fully local, real‑time AI agent that accepts audio or video from a user, processes it on‑device, and replies with synthetic speech—all powered by Apple’s M3 Pro chip and Google’s Gemma E2B model. The open‑source project, posted on GitHub by fikrikarim, stitches together a WebRTC‑based pipeline (RealtimeAI) for low‑latency capture, a speech‑to‑text front‑end, the 2‑billion‑parameter Gemma E2B for inference, and a text‑to‑speech back‑end that streams the response back to the user. The entire stack runs without any cloud calls, leveraging the M3 Pro’s Neural Engine to keep latency under 200 ms, which the author describes as “conversation‑grade” performance.
Why it matters is twofold. First, it proves that sophisticated multimodal agents no longer need heavyweight servers; a consumer‑grade laptop can now host a voice‑first assistant that respects user privacy and eliminates bandwidth costs. Second, it showcases the growing maturity of open‑source LLMs such as Gemma. As we reported on April 6, Google’s Gemma 4 already brought “AI superpowers” to edge devices, and this new demo pushes the envelope further by adding live audio/video handling. The result is a compelling alternative to proprietary offerings like OpenAI’s GPT‑4o Realtime API, which still rely on cloud infrastructure.
What to watch next includes the community’s response to the GitHub repo—whether developers will fork it for niche applications such as Nordic language tutoring or real‑time captioning. Apple’s upcoming WWDC may reveal tighter integration of the Neural Engine with third‑party models, potentially shaving more milliseconds off the round‑trip. Finally, Google’s roadmap for larger Gemma variants could enable even richer conversational experiences on the same hardware, setting the stage for a new wave of on‑device AI products across Europe’s privacy‑focused markets.
Sources
Back to AIPULSEN