Xiaomi MiMo Reasoning Model

github.com

482 points by thm a year ago


Arcuru - a year ago

From the paper, I was intrigued by how they handled their RL step for Code Data. They trained against hard but solvable code generation tasks by running unit testing. Is that training step done by the other models?

> Code Data For coding problems, we curate a high-quality training set comprising open-source datasets and our newly collected problem set. We remove problems without test cases. For problems with golden solutions, we exclude those where the golden solution failed to pass all test cases. For problems without golden solution, we discard problems where no test case can be solved in 16 rollouts of advanced reasoning models. Similar to math data, we utilize an SFT version of MiMo-7B to filter out easy problems that are perfectly solved in all 16 rollouts. This rigorous cleaning process yields 30K code problems.

> During each RL iteration, we evaluate thousands of problems to compute the rewards, with each problem potentially containing hundreds of test cases. To improve reward computing efficiency and eliminate GPU idle time, we developed an online judge environment that enables parallel execution of extremely high-volume unit tests.

lvl155 - a year ago

Why are there so many English-first AI models from China? Are they not interested in serving their own population? Or is it that if they publish Chinese-first models it won't get publicity in the West?

siliconc0w - a year ago

This is incredibly strong coding performance for a 7b. I use Gemini Pro 2.5 which got 67.8 and this got 57.8, very close to Gemini 2.5 Flash which got 60.6.

I've become pretty skeptical about eval results given what we've heard about llama4 so we'll see where this lands on the closed evals but very impressive to see.

jedisct1 - a year ago

GGUF version (for LM Studio, Ollama, etc): https://huggingface.co/jedisct1/MiMo-7B-RL-GGUF

rahimnathwani - a year ago

When you guys use gguf files in ollama, do you normally create a modelfile to go with it, or just hope that whatever default ollama has work with the new model?

https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile....

gizmodo59 - a year ago

Its funny to see benchmarks where they omit the top performing models like O3 (Which is the best model in many benchmarks currently) and Gemini Pro/Claude 3.7.

badmonster - a year ago

MiMo-7B claims to outperform larger models like Qwen-32B and match OpenAI o1-mini on math/code benchmarks — all with a 7B model trained from scratch. Is this a sign that pretraining + RLHF optimization is finally outpacing scale? Or are we just getting better at benchmarking narrow capabilities?

xpe - a year ago

The README says "RL" without specifying what kind of RL is used. Researchers: I know you are busy, and I know good writing takes time, but please don't skip this kind of detail.

Jotalea - a year ago

I wonder if they will use this model for their AI assistant on their Xiaomi 15 series phones. They most likely will. I'm not really sure what to expect from it.

ramesh31 - a year ago

These benchmark numbers cannot be real for a 7b model