IST Releases First Independent Review of DeepSeek V4 Pro, Trails US Frontier
benchmarks deepseek
| Source: Mastodon | Original article
IST's evaluation reveals DeepSeek V4 Pro lags US frontier by 8 months.
IST's independent evaluation of DeepSeek V4 Pro reveals the model lags behind the US frontier by approximately 8 months across five capability domains. This assessment contradicts the benchmarks presented in DeepSeek's own README, which appear overly optimistic. The disparity highlights the importance of third-party evaluations in providing a more accurate understanding of AI models' capabilities.
This evaluation matters as it impacts the perceived value and competitiveness of DeepSeek V4 Pro in the market. Despite being priced significantly lower than other frontier models, with V4-Flash starting at $0.14 per million tokens, the model's performance gap may deter some potential users. As we previously reported, DeepSeek V4 Pro has been touted for its affordability, with some experts noting its potential to offer "near state-of-the-art intelligence at 1/6th the cost of Opus 4.7."
As the AI landscape continues to evolve, it will be essential to monitor how DeepSeek addresses this performance gap and whether the company can close the gap with the US frontier. Additionally, the market's response to this evaluation will be worth watching, particularly in terms of adoption rates and user feedback. With the ongoing development of AI models like Claude Code agent and the discussion around LLMs' understanding of coordinates, the AI community will be keenly interested in DeepSeek's next moves.
Sources
Back to AIPULSEN