Ollama is now powered by MLX on Apple Silicon in preview

apple llama

2026-03-31 | Source: HN | Original article

Ollama, the open‑source platform that lets developers run large language models locally, announced a preview build that leverages Apple’s MLX framework to tap the full horsepower of Apple Silicon. The update replaces the generic CPU‑only backend with an MLX‑driven runner that executes as a separate subprocess, communicating with Ollama’s main server over HTTP. Early tests show a “large speedup” across macOS, cutting inference latency for personal‑assistant bots such as OpenClaw and for coding agents like Claude Code, OpenCode and Codex. The move matters because it demonstrates how Apple’s low‑level machine‑learning stack can be weaponised by third‑party tools to deliver on‑device AI that rivals cloud‑based services in responsiveness while preserving privacy. By exploiting the unified memory architecture and the Neural Engine of M‑series chips, MLX reduces the need for external GPUs and cuts power draw—key factors for developers targeting laptops and desktops that run AI workloads all day. As we reported on 30 March, Apple’s broader AI strategy is shifting toward on‑device models; Ollama’s integration is a concrete example of that vision taking shape. What to watch next is whether the MLX backend graduates from preview to a default component in Ollama’s upcoming stable release, and how quickly other local‑LLM runtimes adopt the same approach. Apple may also expose MLX to iOS and iPadOS, opening the door for mobile‑first AI assistants. Performance benchmarks released by the Ollama team will reveal whether the speed gains are enough to challenge cloud‑centric alternatives, and Apple’s next OS update could include tighter system‑level support for MLX‑based inference, further cementing the Mac as a hub for private, high‑performance AI.

Sources

Back to AIPULSEN