Ivan Fioravanti Joins X

deepseek

2026-05-27 | Source: Mastodon | Original article

Ivan Fioravanti benchmarks DeepSeek V4 Flash on M3 Ultra. He shares custom quantization results.

Ivan Fioravanti has shared a benchmarking experiment on X, testing the DeepSeek V4 Flash's Q4-Q8 quantization on a single M3 Ultra. The custom quantization approach, using q4 for group-size 32 and q8 for the rest, yielded promising results, with q4-imatrix performing better. This experiment is particularly relevant for developers interested in optimizing large models in local or Apple Silicon environments. As we reported on May 1, Ivan Fioravanti has been actively exploring AI model optimization, and this latest experiment builds upon his previous work. The use of RDMA to distribute testing across two M3 Ultra devices is a notable next step, which could lead to significant performance gains. Fioravanti's findings have implications for the broader AI community, as optimizing large models is a key challenge in the field. Looking ahead, it will be interesting to see how Fioravanti's experiment informs future developments in AI model optimization, particularly in the context of Apple's M3 Ultra chip. With the growing demand for efficient AI processing, experiments like these can provide valuable insights for developers and researchers working on similar projects.

Sources

Back to AIPULSEN