Benchmarking a 35B MoE Model on Dual GTX 1080 Ti GPUs: Does the Second GPU Boost Performance?
benchmarks gpu qwen
| Source: Dev.to | Original article
Benchmarking reveals 2x GTX 1080 Ti boosts Qwen3.6-35B-A3B performance by 18%.
Benchmark results are in for running Qwen3.6-35B-A3B on two 8-year-old GTX 1080 Ti cards, achieving approximately 20 tokens per second. This is only an 18% increase over CPU-only performance on an i9-14900K, which reaches around 17 tokens per second. The sparse MoE design of Qwen3.6-35B-A3B enables it to run on older hardware, but a second GPU does not double the speed as one might expect.
This matters because it shows that even older GPUs can be used to run large AI models, albeit with some limitations. The Qwen3.6-35B-A3B model's 3B-active design is what makes it possible to run on two GTX 1080 Ti cards. This has implications for those looking to run AI models on local hardware without breaking the bank on the latest GPUs.
As we look to the future, it will be interesting to see how other large AI models perform on older hardware and whether similar sparse MoE designs can be used to make them more accessible. Additionally, the community's experimentation with different models and hardware configurations, as seen in the Qwen 3.5 and 3.6 comparisons, will continue to push the boundaries of what is possible with local AI setups.
Sources
Back to AIPULSEN