DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]

github.com

74 points by fspeech 3 hours ago


dwohnitmok - 38 minutes ago

Is everyone just glossing over the first place score of 118/120 on the Putnam?! I mean we'll see how it does on the upcoming 2025 test, but that's insane!

We've seen absolutely ridiculous progress in model capability over the past year (which is also quite terrifying).

gunalx - 19 minutes ago

If i read it right it used multiple samples of itself to verify the aqccuracy, but isnt this problematic?

zaxioms - 3 hours ago

It's cool, but I genuinely cannot fathom why they are targeting natural language proofs instead of a proof assistant.

awei - 3 hours ago

Something weird here, why is it so hard to have a deterministic program capable of checking a proof or anything math related, aren't maths super deterministic when natural language is not. From first principles, it should be possible to do this without a llm verifier.

photon_lines - 3 hours ago

Exciting stuff from a fantastic team.

newyankee - an hour ago

That is amazing if they can do all of this at < 10 % of the cost of frontier labs. Off course they work in the shadows of the great work done in the frontier labs and shared, but there is some exceptional high speed execution happening behind the scenes that shows this is clearly a race, but a race where China is happy to be #2 as long as the gap is not significant and the costs are reasonable

- 2 hours ago
[deleted]
agentultra - 2 hours ago

So it's designed for informal proofs and it "verifies" based on a rubric fitting function and human interaction, is that right?

What's the use case for a system like this?