R-Zero: Self-Evolving Reasoning LLM from Zero Data

arxiv.org

121 points by lawrenceyan 4 days ago


Iv - 3 days ago

"Starting from a single base LLM"

Ok, zero data, except the data used in the teacher model.

thom - 3 days ago

For values of zero quite far above zero.

jasonjmcghee - 4 days ago

Conceptually, it's effectively a GAN

nakamoto_damacy - 3 days ago

Perpetual Motion Machines were a thing at some point, too.

clbrmbr - 3 days ago

Terrible choice of name. DeepSeek developed a historically important model called “R-Zero” (this was the predecessor to R1 that was training without any coldstart SFT, and was very strong but difficult to read chain of thought because it code switches into Chinese and has no line breaks).

Davidzheng - 3 days ago

I think in formal domain like lean it should actually be possible to do it from zero--but seems like no major successes no far

lawlessone - 3 days ago

OK but how do you ensure it's improving in a direction that aligns with reality?

freejazz - 3 days ago

I still don't understand what a "reasoning" LLM is

neuroelectron - 3 days ago

Now gamify it.

cyberge99 - 4 days ago

What could go wrong?