Fine-Tuning Gemma 3 with Cloud Run Jobs: Serverless GPUs (NVIDIA RTX 6000 Pro) for pet breed classification 🐈🐕

fine-tuning gemma google nvidia

2026-04-10 | Source: Dev.to | Original article

Google Cloud has rolled out server‑less GPU support on Cloud Run Jobs, letting developers fine‑tune large language models without provisioning dedicated instances. The first public showcase uses the new NVIDIA RTX 6000 Pro (Blackwell) cards to adapt the 27‑billion‑parameter Gemma 3 model for a pet‑breed classification task, turning a generic LLM into a specialist image‑and‑text recogniser for cats and dogs. The workflow, posted by a community engineer, spins up a Cloud Run job that automatically provisions an RTX 6000 Pro, pulls the Gemma 3 weights, and runs a QLoRA‑style fine‑tuning loop on a curated dataset of pet images and breed labels. Pay‑per‑second billing, instant scaling to zero and a 19‑second cold‑start for the 4‑billion‑parameter variant mean the entire experiment costs only a few dollars and can be reproduced on demand. No quota request is required for the L4‑class GPUs that power the service, lowering the barrier for small teams and hobbyists. Why it matters is twofold. First, it democratizes access to high‑end GPU resources, a long‑standing bottleneck for Nordic startups and research groups that lack on‑premise clusters. Second, it signals Google’s push to position Cloud Run as a viable alternative to Vertex AI for custom model work, directly competing with AWS SageMaker Serverless and Azure ML’s managed compute. By coupling open‑source Gemma models—first highlighted in our April 9 coverage of Gemma 4—with truly server‑less hardware, Google is closing the gap between model availability and practical, low‑cost deployment. Looking ahead, the community will likely test the same pipeline on the newer Gemma 4 family and on larger GPU types as they become server‑less. Watch for benchmark releases comparing cost and latency against traditional VM‑based fine‑tuning, and for tighter integration with tools such as Unsloth and Hugging Face’s TRL, which could further accelerate niche AI applications across the Nordics.

Sources

Back to AIPULSEN