Lean proved this program correct; then I found a bug

kirancodes.me

142 points by bumbledraven 4 hours ago


ctmnt - 3 hours ago

This article’s framing and title are odd. The author, in fact, found no bugs or errors in the proven code. She says so at the end of the article:

> The two bugs that were found both sat outside the boundary of what the proofs cover. The denial-of-service was a missing specification. The heap overflow was a deeper issue in the trusted computing base, the C++ runtime that the entire proof edifice assumes is correct.

Still an interesting and useful result to find a bug in the Lean runtime, but I’d argue that doesn’t justify the title. Or the claim that “the entire proof edifice” is somehow shaky.

It’s important to note that this is the Lean runtime that has a bug, not the Lean kernel, which is the part that actually does the verification (aka proving). [1] So it’s not even immediately clear what this bug would really apply to, since obviously no one’s running any compiled Lean code in any kind of production hot path.

[1] https://lean-lang.org/doc/reference/latest/Elaboration-and-C...

porcoda - 3 hours ago

I’ve had similar experiences with code I’ve proven correct, although my issues were of the more common variety than the overflow issue - subtle spec bugs. (I think the post mentions the denial of service issue as related to this: a spec gap)

If you have a spec that isn’t correct, you can certainly write code that conforms to that spec and write proofs to support it. It just means you have verified a program that does something other than what you intended. This is one of the harder parts of verification: clearly expressing your intention as a human. As programs get more complex these get harder to write, which means it isn’t uncommon to have lean or rocq proofs for everything only to later find “nope, it has a bug that ultimately traces back to a subtle specification defect.” Once you’ve gone through this a few times you quickly realize that tools like lean and rocq are tricky to use effectively.

I kinda worry that the “proof assistants will fix ai correctness” will lead to a false sense of assurance if the specs that capture human intention don’t get scrutinized closely. Otherwise we’ll likely have lots of proofs for code that isn’t the code the humans actually intended due to spec flaws.

ajaystream - an hour ago

The spec-completeness problem here is the same one that bites distributed systems verification: the proof holds inside an operating envelope (no adversarial inputs, trusted runtime, bounded sizes), and the interesting failures live at the boundary. TLA+ has the same property - you can prove liveness under a fairness assumption the deployment silently violates, and nothing in the proof tells you when reality drifted outside.

What I'd actually want from the tooling is a machine-checkable statement of the envelope itself, propagated as a runtime guard rather than a compile-time comment. Then "proof holds" and "we are still inside the proof's domain" are two separate, observable properties, and the unverified-parser / unverified-runtime cases stop being invisible.

youknownothing - 2 hours ago

I'll probably get a lot of hate mail for this but here goes nothing... Despite what many people like to claim, you cannot prove that a program has no bugs. That means proving the absence of bugs, and you cannot prove a negative. The best thing you can do is fail to find a bug, but that doesn't mean it isn't there.

Before everyone starts blabbing about formal verification, etc., consider this: how do you know that you didn't make a mistake in your formal verification? IOW, how do you know your formal verification is bug-free? Answer: you don't. Or if you try to formally verify your formal verification then you're just translating the problem to a new layer. It's just a chain of proofs that is always ultimately based on an unproven one, which invalidates the whole chain.

You can get asymptotically close to zero-bug proof, but you can never get there 100% of the way.

grg0 - 2 hours ago

Clickbait title, the proved part of the program had no bugs?

As an aside, why can't people just write factually? This isn't a news site gamed for ad revenue. It's also less effort. I felt this post was mostly an insulting waste of time. I come to HN to read interesting stuff.

lmm - 3 hours ago

Not verifying the parser seems like a pretty big oversight. Parsing binary formats is notoriously dangerous!

davesque - 3 hours ago

Am I reading the article wrong? It appears that the author did not test the claims of the proof. Wouldn't a "bug" in this case mean she found an input that did not survive a round trip through the compression algorithm?

Update: Actually, I guess this may have been her point: "The two bugs that were found both sat outside the boundary of what the proofs cover." So then I guess the title might be a bit click baity.

mindcrime - 3 hours ago

"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth

Animats - 39 minutes ago

Compression/decompression is a good problem for proof of correctness. The specification is very simple (you must get back what you put in), while the implementation is complex.

What seems to have happened here is that the storage allocator underneath is unverified. That, too, has a relatively simple spec - all buffers disjoint, no lost buffers, no crashes.

spullara - 3 hours ago

claude making a statement that sounds impressive but it is actually the first codebase it has ever analyzed.

"This is genuinely one of the most memory-safe codebases I've analyzed."

dchftcs - 3 hours ago

This is analogous to the fundamental problem of better automation in programming - eventually, the complexity and correctness of of the spec takes over, and if we don't manage that well, creating the spec is not that much less work than the programming part. If your program was for the wrong thing, a proof of it is also wrong.

vatsachak - 2 hours ago

You guys are missing the forest for the trees.

They used an AI agent sending ideas to a fuzzer and discovered a heap buffer overflow in Lean. This is big.

sebstefan - an hour ago

So that's just one more win for formal verification despite the title it seems

I'm genuinely excited about AI agents and formal verification languages. To me it's obviously the way forward instead of moonshots trying to make agents that program in their own AI blackbox binary, or agents that code in current programming languages.

If we are heading in the direction of "huge codebases that nobody has written", or, "code is an artifact for the machine", I don't see a way out without making it proved.

If humans can review and edit the spec, then verify that the implementation matches it, suddenly leaving the implementation be an artifact for the machines seems okay

The downside of provers also being that they are a massive pain in the ass that very few want to use, this is also a complete win.

ozten - an hour ago

HN to be renamed Flex Social. Damn.

akoboldfrying - an hour ago

Nice work. Amusing that Lean's own standard library has a buffer overflow bug, which "leaked out" due to being exempted from the verification.

Regarding the DoS in the lean-zip application itself: I think this is a really neat example of the difficult problem of spec completeness, which is a subcase of the general problem (mentioned by porcoda in a sibling comment) of being sure that the spec is checking the right things. For a compression program, the natural, and I would also say satisfyingly beautiful thing to prove is that decomp(comp(x)) = x for all possible inputs x. It's tempting to think that at that point, "It's proven!" But of course the real world can call decomp() on something that has never been through comp() at all, and this simple, beautiful spec is completely silent on what will happen then.

readthenotes1 - 2 hours ago

I didn't like the clickbait title. I would have preferred something along the lines of

"Lean proves other program correct but not itself"

jongjong - 2 hours ago

Formal proofs can only ever work to validate against requirements... But a major issue with a lot of modern software is that the requirements themselves can be incorrect and some requirements can logically conflict with other requirements... So long as we have non-technical people formulating requirements, we can never have provably correct software.

I've come across issues in the past which weren't actually bugs. For example, the software was behaving exactly as intended but it looked like a bug to a user who didn't understand the nuances and complexities of what was happening.

I also cannot count how many times a non-technical person asked me to implement conflicting functionality or functionality which would have made the UX incredibly confusing for the user.

san_tekart - 2 hours ago

[dead]

ernsheong - 2 hours ago

Alan Turing already proved with the Halting Problem that reasoning about program correctness is not possible. But we still try.

Wikipedia: [1] Turing proved no algorithm exists that always correctly decides whether, for a given arbitrary program and input, the program halts when run with that input. The essence of Turing's proof is that any such algorithm can be made to produce contradictory output and therefore cannot be correct.

[1] https://en.wikipedia.org/wiki/Halting_problem