Who Can Understand the Proof? A Window on Formalized Mathematics
writings.stephenwolfram.com190 points by ColinWright 3 months ago
190 points by ColinWright 3 months ago
One of the perennial questions about proof automation has been the utility of proofs that cannot be understood by humans.
Generally, most computer scientists using proof automation don't care about the proof itself -- they care that one exists. It can contain as many lemmas and steps as needed. They're unlikely to ever read it.
It seems to me that LLMs would be decent at generating proofs this way: so long as they submit their tactics to the proof checker and the proof is found they can generate whatever is needed.
However for mathematicians, of which I am not a member of that distinguished group, seem to appreciate qualities in proofs such as elegance and simplicity. Many mathematicians that I've heard respond to the initial question believe that a proof generated by some future AI system will not be useful to humans if they cannot understand and appreciate it. The existence of a proof is not enough.
Now that we're getting close to having algorithms that can generate proofs it makes the question a bit more urgent, I think. What use is a proof that isn't elegant? Are proofs written for a particular audience or are they written for the result?
Mathematician here (trained as pure, working as applied). Non-elegant proofs are useful, if the result is important. e.g. People would still be excited by an ugly proof of the Riemann hypothesis.^1 It's important too a lot of other theorems if this is true or not. However, if the result is less central you won't get a lot of interest.
Part of it is, I think, that "elegance" is flowery language that hides what mathematicians really want: not so much new proofs as new proof techniques and frameworks. An "elegant" proof can, with some modification, prove a lot more than its literal statement. That way, even if you don't care much about the specific result, you may still be interested because it can be altered to solve a problem you _were_ interested in.
1: It doesn't have to be as big of a deal as this.
Then again, even an 'elegant' proof can be surprisingly inflexible. I've recently been working through Apéry's proof that ζ(3) is irrational. It's so simple that even a clueless dabbler like me can understand all the details. Yet no one has been able to make his construction work directly for anything else (that hasn't already been proven irrational). C'est la vie, I suppose.
There was a post yesterday of a quanta article: https://news.ycombinator.com/item?id=42644896.
The article explains that two mathematicians were able to place Apery's proof that ζ(3) is irrational into a much wider (and hence more powerful) framework. I doubt that framework is as easy to understand as the original proof. But in the end something with wider applicability did come out of the proof.
Yeah, many of the fancy analytic methods are beyond my level of dabbling. I've been trying to learn more about them, so I can solve the myriad exercises left to the reader in all the Diophantine approximation papers.
Still, the newer methods publicized in the Quanta article definitely get more involved, and at least from my perspective they don't establish things as elegantly as Apéry's ζ(2) and ζ(3) arguments do. Hopefully they turn out to be powerful in practice, to make up for it.
One thing that many mathematicians today don’t think about is how deeply intertwined the field has historically been with theology. This goes back to the Pythagoreans at least.
That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness. Which, incidentally, are directly related to logic, aesthetics, and ethics.
The value of truth in a proof is most obvious.
The value of aesthetics is harder to explain, but there's no denying that it is in fact observably valued by mathematicians.
As for ethics, remember that human morality is a proper subset thereof. Ethics concerns itself with what is good. It may feel like a stretch, but it's perfectly reasonable to say that for two equally true proofs of the same thing, the one that is more beautiful is also more good. Also, obviously, given two equally beautiful proofs, if only one is true then it is also more good.
> That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness
As a non-mathematician, I've noticed this as well, and I have a suspicion the historical "culture" is holding the field back. Gödel proved there are an infinite number of true arithmetic statements unprovable within any (consistent, sufficiently powerful) formal system. But our "gold standard" formal system, ZFC, has about as many axioms as we have fingers — why is finding more axioms not the absolute highest priority of the field?
We struggle to prove facts about Turing machines with only six states, and it's not obvious to me that ZFC is even capable of resolving all questions about the behavior of six state Turing machines (well, specifically just ZF, as C has no bearing on these questions).
Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part). If we can't predict the behavior of the majority of tiny, deterministic systems with ZFC, what does that say about our ability to understand and predict real world data, particularly considering that this data likely has an underlying algorithmic structure vastly more complex than that of a six state Turing machine?
More formally, my complaint with the culture of mathematics is:
1) We know that for any string of data, I(data : ZFC) ≤ min(K(data), K(ZFC)) + O(1)
2) K(ZFC) is likely no more than a few bytes. I think the best current upper bound is the description length of a Turing machine with a few hundred states, but I suspect the true value of K(ZFC) is far lower than that
3) Thus K(data) - K(data | ZFC) ≤ "a few bytes"
Consider the massive amounts of data that we collect to train LLMs. The totality of modern mathematics can provide no more than a few bytes of insight into the "essence" of this data (i.e., the maximally compressed version of the data). Which directly translates to limited predictability of the data via Solomonoff induction. And that's in principle — this doesn't even consider the amount of time involved. If we want to do better, we need more axioms, full stop.
One might counter, "well sure, but mathematicians don't necessarily care about real world problems". Ok, just apply the same argument to the set of all arithmetic truths. Or the set of unprovable statements in the language of a formal system (that are true within some model). That's some interesting data. Surely ZFC can discover most "deep" mathematical truths? Not very likely. The deeper truths tend to occur at higher levels of the arithmetic hierarchy. The higher in the hierarchy, the more interesting the statement. And these are tiny statements too: ∀x ∃y ∀z [...]. Well we're already in trouble because ZFC can only decide a small fraction of the Π_2 statements that can fit on a napkin and it drops off very quickly at higher levels than that. Again, we need more axioms.
> Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part)
The infinite tape part isn't some minor detail, it's the source of all the difficulty. A "finite-tape Turing machine" is just a DFA.
> is just a DFA
Oh is that all? If resource bounded Kolmogorov complexity is that simple, we should have solved P vs NP by now!
I debated adding a bunch of disclaimers to that parenthetical about when the infinite tape starts to matter, but thought, nah, surely that won’t be the contention of the larger discussion point here haha
It’s an LBA, a Linear Bounded Automata.
No, an LBA in general doesn't have a finite tape. It still has an infinite tape, to accommodate arbitrary length inputs, it's just that the tape cannot grow beyond the length of its input (or a constant multiple of it, which is equivalent by the linear speedup trick).