MAI-Code-1-Flash

microsoft.ai

338 points by EvanZhouDev 5 hours ago


https://microsoft.ai/models/mai-code-1-flash/

https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF

Launching seven new MAI models: https://microsoft.ai/news/building-a-hillclimbing-machine-la...

bel8 - 3 hours ago

It's a start and I welcome competition but I don't think I ever used small cloud models like Haiku 4.5. They are cute but for serious coding they tend to waste your expensive time.

And this certainly wont bring me back to GitHub Copilot which I cancelled yesterday.

GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs: https://www.reddit.com/r/GithubCopilot

I have since changed to DeekSeek Flash on high which is Sonnet+ level for almost free.

If I feel I still need smarter models I might signup for $20/mo Codex to use GPT 5.5 which, in my opinion, is the best I can access right now.

camelmel - 4 hours ago

Huh, according to that model card this is a 137B total parameter model.

Performance doesn't seem that good:

- MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro

- Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B)

They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.

GaryBluto - 3 hours ago

What's with the lack of Microsoft design language on the website? It's painfully obvious they're trying to emulate Anthropic's style here and it looks tacky.

AAYALAG - 15 minutes ago

While competition is always welcome, it’s a questionable strategy to market a model for complex coding tasks when more efficient, lower-cost, or even free alternatives already exist. Small models have a clear niche for daily, low-effort tasks; even if they underperform against SOTA models, there is a massive range of use cases where they are the better fit.

hmokiguess - 4 hours ago

Does anyone actually uses these smaller models for coding? If so, how? I usually Opus everything. Is the play to plan/design/architect with a heavier model than delegate structured tasks to these smaller ones? Would appreciate to hear someone's opinion on having done and tested both paths.

capten - 4 hours ago

It's so weird to me that the benchmarks remain so low, but the models are marketed as revolutionary. And if you say that low coding capabilities aren't a problem, say that to the token price hike and 'general use' model setup.

Why not sell it as a math agent? Why do I have to set up 4 agents to check each others' work?

cwillu - 2 hours ago

What is with people reimplementing window scrolling badly?

AntiRush - 5 hours ago

The introductory blog post has a lot more information

https://microsoft.ai/news/introducingmai-code-1-flash/

and the model card

https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF

The broader announcement of 7 MAI models seems to be where the 5B active in the title comes from

https://microsoft.ai/news/building-a-hillclimbing-machine-la...

deckar01 - 4 hours ago

If only they had launched that yesterday I might have avoided Copilot auto model selection using a 9x model, quietly burning my monthly quota in a single afternoon.

AJRF - 3 hours ago

Copilot brand is tarnished, so time to bung everything under MAI?

OsrsNeedsf2P - 5 hours ago

So it's trained on the SWE Bench Pro evalset

dang - 2 hours ago

Related ongoing thread:

MAI-Thinking-1 - https://news.ycombinator.com/item?id=48374362 - June 2026 (64 comments)

efields - 4 hours ago

Please test your websites in Safari. Almost all of your iOS users use it by default, and the desktop experience is pretty close to the mobile experience, so testing is easy.

That scroll effect is jank city for me (yeah yeah works fine in Chrome/Edge).

mentos - 4 hours ago

Shouldn’t the next model focus not be on code but system design?

Seems like the work from a good system design to code is practically solved.

Now it’s a matter of the design of the system. Or is that represented in these evals?

tosh - 4 hours ago

not open weight or at least I did not find anything indicating open weight

npn - 3 hours ago

I personally do not like Microsoft, but congrats them to release this model.

While the scores are not good compare to other open weight model, the important thing to note is their training data (as they claimed) is very clean, without any synthetic datasets.

onlyrealcuzzo - 5 hours ago

Gemma 4 26B-A4B scored exceptionally well with 20% less params, so this isn't unprecedented.

ajyoon - 4 hours ago

Scroll wheel hijacked on this entire domain

smcleod - 2 hours ago

I don't see the point in comparing yourself to Haiku which is not only useless for coding but also old. No thanks Microsoft.

- 2 hours ago
[deleted]
giancarlostoro - 4 hours ago

Mark Zuckerberg must be in crisis. Microsoft releasing models that compete with Claude's models. Meanwhile the only thing anyone knows about Mark's models is that they help you get hacked more easily.

mmaunder - 4 hours ago

You lost me at forced scrolling. Ugh!

ruined - 2 hours ago

wtf are they doing to the scroll on that page

bguberfain - 4 hours ago

It is good to se big companies like Microsoft launching LLMs. They have large amount of compute power and good scientists to create useful models.

cainxinth - 2 hours ago

Claude Haiku 4.5 results with 60% fewer tokens. Sounds good, but they don't list token costs.

hootz - 4 hours ago

I'd love to see a tokens per second metric. I always prioritize speed over raw intelligence for flash models.

arunkant - an hour ago

Why do websites still hijack scrolling? It sucks

striking - 4 hours ago

To be clear about the size of the model: MAI-Code-1-Flash is 137B A5B.

randomsc - 2 hours ago

“ Build for developers, not benchmarks” is the worst marketing shot I ever heard

- 3 hours ago
[deleted]
gslepak - 4 hours ago

Would be cool if this were an open model.

jMyles - 4 hours ago

I'd really like to get back to an autocomplete flow, ideally with some shared and optimized context with the relationship with my larger agent models.

But it seems like, by and large, even the faster models are now aimed at longer-running agentic flows and not sub-1s autocomplete. Or am I wrong about that?

LoganDark - 4 hours ago

"Clean data" is impossible. Language models have polluted the landscape to such a degree it's impossible to filter them out now. OpenAI has no doubt discarded or muddled their dataset that was used to train the original ChatGPT, so there may be no dataset in existence now that isn't contaminated.

Computer0 - 2 hours ago

I went to VSC specifically to avoid the pricing I started experiencing on Cursor. After this change I have no reason to stick with GH Copilot, I'd rather keep buying OR credits.

ilia-a - 3 hours ago

I mean they are comparing themselves to Haiku of all things, geez that's not a good start...

Marciplan - 4 hours ago

"Build for developers, not benchmarks" Shouldn't that be.. Built?

kylehotchkiss - 4 hours ago

"superintellegence team"

Why not assign them to make windows good :D

mat0 - an hour ago

how long until they rebrand this shit as copilot?

zb3 - 4 hours ago

So it's not an open model while not being much better? Meh.

freediddy - 4 hours ago

is 51% good enough to reliably use? There's no world in which I use an AI agent where it gets even 15% of the code wrong, that's as bad a Tesla FSD where you need to pay attention to the road while engaging FSD. What's the point? My attention is what I'm trying to relieve, not mostly correct functionality. The only thing that matters is whether you can one-shot code like Claude or Codex, I'm not interested in a small but mostly-okay-but-annoyingly-buggy-every-now-and-then AI.

vancekai - 4 hours ago

[flagged]

ghord - 4 hours ago

[dead]

pzo - 3 hours ago

TLDR; this is just Claude Haiku altrenative, you can probably skip whole article.

briangao - 2 hours ago

[dead]

Ozzie-D - 4 hours ago

[dead]

fooker - 5 hours ago

[flagged]

mattlondon - 4 hours ago

Comparing against Claude 4.5? Aren't we up to 4.8? But disingenuous?