Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

815 points by mfiguiere 18 hours ago

The pelican is excellent for a 16.8GB quantized local model: https://simonwillison.net/2026/Apr/22/qwen36-27b/

I ran it on an M5 Pro with 128GB of RAM, but it only needs ~20GB of that. I expect it will run OK on a 32GB machine.

Performance numbers:

  Reading: 20 tokens, 0.4s, 54.32 tokens/s
  Generation: 4,444 tokens, 2min 53s, 25.57 tokens/s

I like it better than the pelican I got from Opus 4.7 the other day: https://simonwillison.net/2026/Apr/16/qwen-beats-opus/

throwaw12 - 15 hours ago

I feel like this time it is indeed in the training set, because it is too good to be true.
Can you run your other tests and see the difference?
- simonw - 15 hours ago
  
  It went pretty wild with "Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER":
  https://gist.github.com/simonw/95735fe5e76e6fdf1753e6dcce360...
  - throwaw12 - 15 hours ago
    
    compared to your test with GLM 5.1, this indeed looks off
    https://xcancel.com/simonw/status/2041646779553476801
    
    simonw - 14 hours ago
    
    Yeah GLM 5.1 did an outstanding job on the possum - better than Opus 4.7 or GPT-5.4 and I think better than Gemini 3.1 Pro too.
    But GLM 5.1 is a 1.51TB model, the Qwen 3.6 I used here was 17GB - that's 1/88 the size.
    
    zamadatix - 13 hours ago
    
    The point is in the relative difference between the Pelican vs "other" test for each model suggesting the Pelican is being treated special these days (could be as simple as being common in recent data), not the relative difference between the models on the "other" case in isolation.
    
    refulgentis - 14 hours ago
    
    Hoping this doesn't turn into a pelican-SVG back-and-forth: yesterday's GPT Image 2 thread ended up being three screenfuls of "I tried the prompt too" replies, and nothing on the model until you scroll past it. I appreciate the testing, and I know this sounds like fun police, but there's a pattern where well-known commenter + one-off vibe test + 1:1 sub-threads eats the whole discussion. It being fun makes it hard to push back on without looking picky.
    
    simonw - 14 hours ago
    
    You can collapse the pelican thread with the little [-] toggle at the top.
    
    taspeotis - 14 hours ago
    
    Why would you though?
    And by the way: Thanks for relentlessly holding new models’ feet to the pelican SVG fire.
    
    refulgentis - 14 hours ago
    
    Because I want to read about Qwen, not someone's one-off vibe test followed by 1:1 conversations. (case in miniature here: which is the last comment in this thread that says something about Qwen? The root post. Is that fun policing? Yes, apologies.)
    
    simonw - 14 hours ago
    
    There's a bunch of useful information in my comment that's independent of the fact that it drew a pelican:
    1. You can run this on a Mac using llama-server and a 17GB downloaded file
    2. That version does indeed produce output (for one specific task) that's of a good enough quality to be worth spending more time checking out this model
    3. It generated 4,444 tokens in 2min 53s, which is 25.57 tokens/s