Show HN: Multi-agent autoresearch for ANE inference beats Apple's CoreML by 6×
agents apple chips inference
| Source: HN | Original article
A GitHub project posted on Hacker News this week demonstrates that a multi‑agent “autoresearch” system can squeeze dramatically more performance out of Apple’s Neural Engine (ANE) than the company’s own Core ML framework. The open‑source tool, built on Andrej Karpathy’s autoresearch codebase, lets a swarm of lightweight agents explore, combine and discard inference strategies in real time. Across a suite of iPhone, iPad and Mac silicon chips, the agents converged on pipelines that cut median latency by up to 6.31 × compared with the baseline Core ML models running on the same hardware.
The result matters because Core ML is the default gateway for on‑device AI on Apple products, yet its abstractions hide the ANE’s low‑level capabilities and do not support on‑device training. By automatically discovering chip‑specific kernels, memory layouts and scheduling tricks, the autoresearch system shows that the ANE can be far more efficient than Apple’s public stack suggests. Faster inference directly translates into smoother augmented‑reality experiences, real‑time translation and more responsive personal‑assistant features on devices that already prioritize privacy.
As we reported on 31 March, distributed LLM inference across NVIDIA Blackwell GPUs and Apple silicon already highlighted the platform’s raw potential; this new benchmark pushes the conversation from raw throughput to software‑level optimization. The next steps to watch are whether Apple will open lower‑level ANE APIs or integrate similar auto‑tuning techniques into Core ML, and how quickly third‑party frameworks such as PyTorch or TensorFlow adopt the approach. Upcoming silicon generations—M3, the next iPhone Fold prototypes—and any official performance claims from Apple will provide the next data points to gauge whether community‑driven autoresearch can reshape on‑device AI development.
Sources
Back to AIPULSEN