Title: P4: FOSDEM 2024 offline [2024-02-09 Fri] trainable parameters. Low-rank subspace finetuning.

embeddings fine-tuning

2026-04-18 | Source: Mastodon | Original article

A team of researchers unveiled a new approach to fine‑tuning massive language models at FOSDEM 2024, demonstrating that only a tiny slice of a model’s parameters needs to be updated to achieve task‑specific performance. The presentation, titled “P4: Offline Low‑Rank Subspace Fine‑tuning,” showed how the input‑embedding layer can be adapted via gradient descent while the bulk of the network remains frozen. The key tricks are twofold. First, a Fastfood transform re‑parameterises weight updates, turning dense gradients into a compact set of random projections that are cheap to compute and store. Second, the method builds on LoRA (Low‑Rank Adaptation), injecting low‑rank matrices—or their Kronecker‑product equivalents—into each transformer layer. By freezing the pre‑trained weights and learning only these low‑rank factors, the number of trainable parameters drops from billions to a few thousand, cutting memory and compute requirements dramatically. Why it matters is that the technique makes on‑device or edge‑side model adaptation feasible without sacrificing the quality of large‑scale pre‑training. As we reported on 15 April, Google’s Gemma 4 already runs fully offline on iPhones, but fine‑tuning on such constrained hardware has remained out of reach. The new low‑rank subspace method could bridge that gap, enabling personalized AI assistants, domain‑specific chatbots, and privacy‑preserving applications that learn locally from user data. The next steps to watch include the release of an open‑source implementation, likely through TensorFlow’s Parameter Server ecosystem, and integration into popular libraries such as PyTorch‑Lightning. Industry players may soon embed the approach in SDKs for mobile and IoT devices, while academic groups are expected to benchmark it against full‑model fine‑tuning on standard NLP suites. If the early results hold, low‑rank offline adaptation could become a cornerstone of the next wave of edge AI.

Sources

Back to AIPULSEN