GitHub - mattmireles/gemma-tuner-multimodal: Fine-tune Gemma 4 and 3n with audio, images and text on Apple Silicon, using PyTorch and Metal Performance Shaders.

apple fine-tuning gemma google meta multimodal open-source

2026-04-08 | Source: Mastodon | Original article

A new open‑source toolkit released on GitHub lets developers fine‑tune Google’s Gemma 4 and the smaller 3‑parameter “Gemma 3n” on Apple‑silicon Macs, adding audio, image and text capabilities through LoRA adapters. The project, authored by Matt Mireles, builds on PyTorch’s Metal Performance Shaders (MPS) backend, enabling the entire training loop to run on the GPU cores of M1, M2 and M2 Ultra chips without resorting to external cloud resources. The announcement follows our coverage of Google’s decision earlier this month to open‑source Gemma 4, a 9‑billion‑parameter LLM that can already run locally on phones and laptops. By extending the model to multimodal inputs and providing a native Apple‑silicon pipeline, the Gemma‑tuner‑multimodal repository bridges a gap that has limited on‑device AI to text‑only workloads. Developers can now experiment with speech‑to‑text, image captioning or audio‑driven assistants directly on their Macs, preserving user privacy and slashing inference costs. The move matters for the Nordic AI ecosystem, where a high proportion of startups and research labs rely on Mac workstations. Local multimodal fine‑tuning lowers the barrier to entry for small teams that lack access to large GPU clusters, potentially accelerating product prototypes in health tech, media analysis and edge robotics. It also showcases the growing maturity of Apple’s M‑series GPUs for deep‑learning tasks, a trend that could reshape hardware choices for AI‑first companies in the region. Watch for community‑driven benchmarks that compare MPS‑based training speed and energy consumption against CUDA‑based setups, and for any updates from Apple that might expose additional MPS primitives or integrate the toolkit into Xcode. A subsequent wave of third‑party plugins—e.g., for real‑time audio processing or on‑device deployment to iOS—could turn the Mac into a full‑stack multimodal AI platform within months.

Sources

Back to AIPULSEN