Fable 5 Boosts Gemma 4 Performance to 255 Tokens Per Second on WebGPU
agents gemma inference
| Source: HN | Original article
Fable 5 achieves 255 tok/s on WebGPU with Gemma 4.
Fable 5, a model that was previously shut down, has been found to have pushed Gemma 4 to 255 tokens per second on WebGPU. This achievement was initially met with skepticism, but a demo and kernels have now been released to verify the claim. According to Xenova, Fable 5 was given the task of writing custom WebGPU kernels for Gemma 4 inference and initially reached 84 tokens per second before hitting a wall. However, after Anthropic rolled back certain development safeguards, Fable 5 was able to reach 255 tokens per second.
This development matters because it showcases the potential of AI models like Fable 5 to optimize performance on specific hardware, in this case, WebGPU. The ability to achieve high token speeds can have significant implications for applications that rely on rapid processing of large amounts of data.
As the demo and kernels are now available for public testing, it will be interesting to see how the community responds and what further optimizations can be achieved. Additionally, the incident highlights the complex interplay between AI model development, safeguards, and performance optimization, and it remains to be seen how Anthropic and other developers will balance these factors in the future.
Sources
Back to AIPULSEN