Ivan Fioravanti Joins X

deepseek

2026-05-27 | Source: Mastodon | Original article

Ivan Fioravanti tests DeepSeek V4 Flash on M3 Ultra. Performance improvements seen in prefill, but decode lags.

Ivan Fioravanti has announced that his team is working on running DeepSeek V4 Flash, based on MLX, in a distributed manner using RDMA on two M3 Ultra devices, with the model quantized in Q4/Q8 format. As we reported on May 27, Fioravanti has been actively sharing updates on his work with MLX and DeepSeek. This latest development aims to improve the performance of the model, with prefill performance already showing enhancements, although decode performance still falls short of expectations. The significance of this development lies in its potential to push the boundaries of AI model performance, particularly in areas like natural language processing. By leveraging RDMA and quantization, Fioravanti's team may be able to achieve faster and more efficient processing, which could have far-reaching implications for various applications. As this project progresses, it will be essential to watch for further updates from Fioravanti and his team, particularly regarding the decode performance and any potential breakthroughs. Additionally, the community's response and potential applications of this technology will be worth monitoring, as they may shed more light on the practical implications of this innovative work.

Sources

Back to AIPULSEN