Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7
benchmarks claude qwen
| Source: Mastodon | Original article
Simon Willison’s “pelican‑riding‑a‑bicycle” benchmark posted on his blog this morning put two freshly released large language models head‑to‑head in a visual test that is as whimsical as it is revealing. Running the 35‑billion‑parameter Qwen 3.6‑35B‑A3B locally on his laptop, Willison generated an SVG of a pelican on a bike that many observers judged to be cleaner, more proportionate and aesthetically superior to the same prompt rendered by Anthropic’s new Claude Opus 4.7. The side‑by‑side comparison, posted at simonwillison.net/2026/Apr/16/qwen-beats-opus, quickly gathered comments from the AI community, sparking a fresh round of informal competition among developers.
The episode matters because it showcases how an open‑source model can now rival a proprietary flagship on a creative generation task while running on consumer hardware. Qwen 3.6‑35B‑A3B, released by Alibaba earlier this month, was highlighted in our coverage of its agentic coding capabilities (see our 2026‑04‑16 article). Its ability to produce high‑quality vector graphics without cloud resources challenges the narrative that cutting‑edge multimodal output is the exclusive domain of paid APIs. For Anthropic, the result is a reminder that even its most advanced model, Claude Opus 4.7—documented in the same day’s model‑card release—must continue to improve its visual synthesis pipeline to stay competitive.
Looking ahead, the community will likely expand the pelican benchmark into a broader suite of SVG prompts, testing consistency, style transfer and text‑to‑image fidelity across model families. Anthropic may roll out updates to Opus or introduce a dedicated visual module, while Alibaba could open up further fine‑tuning tools for Qwen. Industry watchers should also monitor whether cloud providers begin offering Qwen‑based inference at scale, and how the open‑source momentum influences enterprise adoption of locally runnable multimodal models.
Sources
Back to AIPULSEN