Akira Muramoto, Stamp CEO (@1amageek) on X

apple inference meta

2026-04-19 | Source: Mastodon | Original article

Stamp Inc.’s chief executive Akira Muramoto announced on X that the company is close to delivering a runtime that merges Nvidia’s CUDA API with Apple’s Metal framework for large‑language‑model (LLM) workloads. The update, posted on 19 April, signals that developers will soon be able to run the same LLM inference code on both CUDA‑enabled GPUs and Apple silicon without rewriting or retargeting their pipelines. The move matters because the AI ecosystem has become increasingly split between Nvidia‑centric data‑center GPUs and the growing fleet of Apple devices powered by M‑series chips. Current toolchains—PyTorch, TensorFlow, and Apple’s Core ML—require separate code paths or rely on third‑party bridges that add latency and maintenance overhead. By exposing CUDA’s familiar API while translating calls to Metal under the hood, Stamp aims to give engineers a single, portable interface, potentially accelerating the deployment of chatbots, code assistants and other LLM‑driven services on edge devices such as Macs, iPads and iPhones. If successful, the integration could pressure larger players to broaden their own cross‑platform support. Nvidia has hinted at “Metal‑compatible” kernels, while Apple continues to expand its on‑device ML stack. Stamp’s approach may also lower the barrier for startups that lack the resources to maintain dual‑stack codebases, fostering a more diverse set of AI applications in the Nordic market where mobile‑first solutions are common. What to watch next: a technical preview slated for early June, where developers can test the unified runtime on a range of hardware. Follow‑up statements from Nvidia and Apple will reveal whether the industry will co‑operate on standardising such bridges, or if competing proprietary solutions will emerge. The speed of adoption will hinge on benchmark results, licensing terms and the ease with which existing CUDA code can be ported to Metal via Stamp’s layer.

Sources

Back to AIPULSEN