How empirically certain can we be in any use of mathematical reasoning to make empirical claims? In contrast to errors in many other forms of knowledge such as medicine or psychology, which have enormous literatures classifying and quantifying error rates, rich methods of meta-analysis and pooling expert belief, and much one can say about the probability of any result being true, mathematical error has been rarely examined except as a possibility and a motivating reason for research into formal methods. There is little known beyond anecdotes about how often published proofs are wrong, in what ways they are wrong, the impact of such errors, how errors vary by subfield, what methods decrease (or increase) errors, and so on. Yet, mathematics is surely not immune to error, and for all the richness of the subject, mathematicians can usually agree at least informally on what has turned out to be right or wrong1, or good by other criteria like fruitfulness or beauty. Gaifman 2004 claims that errors are common but any such analysis would be unedifying:
An agent might even have beliefs that logically contradict each other. Mersenne believed that 267-1 is a prime number, which was proved false in 1903, cf. Bell (1951). [The factorization, discovered by Cole, is: 193,707,721 × 761,838,257,287.]…Now, there is no shortage of deductive errors and of false mathematical beliefs. Mersenne’s is one of the most known in a rich history of mathematical errors, involving very prominent figures (cf. De Millo et al. 1979, 269–270). The explosion in the number of mathematical publications and research reports has been accompanied by a similar explosion in erroneous claims; on the whole, errors are noted by small groups of experts in the area, and many go unheeded. There is nothing philosophically interesting that can be said about such failures.2
I disagree. Quantitative approaches cannot capture everything, but why should we believe mathematics is, unlike so many other fields like medicine, uniquely unquantifiable and ineffably inscrutable? As a non-mathematician looking at mathematics largely as a black box, I think such errors are quite interesting, for several reasons: given the extensive role of mathematics throughout the sciences, errors have serious potential impact; but in collecting all the anecdotes I have found, the impact seems skewed towards errors in quasi-formal proofs but not the actual results. One might say that reviewing math errors, the stylized summary is “although the proofs are usually wrong, the results are usually right.”
I find this highly surprising and nontrivial, and in striking contrast to other fields I am familiar with, like sociology or psychology, where usually wrong methods lead to wrong results—it is not the case in the Replication Crisis that flaws like p-hacking are merely ‘framing a guilty man’, because followup with more rigorous methods typically shows effects far smaller than measured or predicted, or outright reversal of direction. This difference may tell us something about what it is that mathematicians do subconsciously when they “do math”, or why conjecture resolution times are exponentially-distributed, or what the role of formal methods ought to be, or what we should think about practically important but unresolved problems like P = NP.
“Beware of bugs in the above code; I have only proved it correct, not tried it.”
“When you have eliminated the impossible, whatever remains is often more improbable than your having made a mistake in one of your impossibility proofs.”
In some respects, there is nothing to be said; in other respects, there is much to be said. “Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes” discusses a basic issue with existential threats: any useful discussion will be rigorous, hopefully with physics and math proofs; but proofs themselves are empirically unreliable. Given that mathematical proofs have long been claimed to be the most reliable form of epistemology humans know and the only way to guarantee truth3, this sets a basic upper bound on how much confidence we can put on any belief, and given the lurking existence of systematic biases, it may even be possible for there to be too much evidence for a claim (Gunn et al 2016). There are other rare risks, from mental diseases4 to hardware errors5 to how to deal with contradictions6, but we’ll look at mathematical error.
“When I asked what it was, he said, ‘It is the probability that the test bomb will ignite the whole atmosphere.’ I decided I would check it myself! The next day when he came for the answers I remarked to him, ‘The arithmetic was apparently correct but I do not know about the formulas for the capture cross sections for oxygen and nitrogen—after all, there could be no experiments at the needed energy levels.’ He replied, like a physicist talking to a mathematician, that he wanted me to check the arithmetic not the physics, and left. I said to myself, ‘What have you done, Hamming, you are involved in risking all of life that is known in the Universe, and you do not know much of an essential part?’ I was pacing up and down the corridor when a friend asked me what was bothering me. I told him. His reply was, ‘Never mind, Hamming, no one will ever blame you.’”
“…of the two major thermonuclear calculations made that summer at Berkeley, they got one right and one wrong.”
This upper bound on our certainty may force us to disregard certain rare risks because the effect of error on our estimates of existential risks is asymmetric: an error will usually reduce the risk, not increase it. The errors are not distributed in any kind of symmetrical around a mean: an existential risk is, by definition, bumping up against the upper bound on possible damage. If we were trying to estimate, say, average human height, and errors were distributed like a bell curve, then we could ignore them. But if we are calculating the risk of a super-asteroid impact which will kill all of humanity, an error which means the super-asteroid will actually kill humanity twice over is irrelevant because it’s the same thing (we can’t die twice); however, the mirror error—the super-asteroid actually killing half of humanity—matters a great deal!
How big is this upper bound? Mathematicians have often made errors in proofs. But it’s rarer for ideas to be accepted for a long time and then rejected. But we can divide errors into 2 basic cases corresponding to type I and type II errors:
- Mistakes where the theorem is still true, but the proof was incorrect (type I)
- Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)
Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable7.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept—but of course the results were some of the greatest mathematical work ever conducted8 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.9 Other cases are more straightforward, with mathematicians publishing multiple proofs/patches10 or covertly correcting papers11. Sometimes they make it into textbooks: Carmichael realized that his proof for Carmichael’s totient function conjecture, which is still open, was wrong only after 2 readers saw it in his 1914 textbook The Theory of Numbers and questioned it. Attempts to formalize results into experimentally-verifiable results (in the case of physics-related math) or machine-checked proofs, or at least some sort of software form, sometimes turns up issues with12 accepted13 results14, although not always important (eg. the correction in Romero & Rubio 2013). Poincaré points out this mathematical version of the pessimistic induction in “Intuition and Logic in Mathematics”:
Strange! If we read over the works of the ancients we are tempted to class them all among the intuitionalists. And yet nature is always the same; it is hardly probable that it has begun in this century to create minds devoted to logic. If we could put ourselves into the flow of ideas which reigned in their time, we should recognize that many of the old geometers were in tendency analysts. Euclid, for example, erected a scientific structure wherein his contemporaries could find no fault. In this vast construction, of which each piece however is due to intuition, we may still today, without much effort, recognize the work of a logician.
… What is the cause of this evolution? It is not hard to find. Intuition can not give us rigour, nor even certainty; this has been recognized more and more. Let us cite some examples. We know there exist continuous functions lacking derivatives. Nothing is more shocking to intuition than this proposition which is imposed upon us by logic. Our fathers would not have failed to say: “It is evident that every continuous function has a derivative, since every curve has a tangent.” How can intuition deceive us on this point?
… I shall take as second example Dirichlet’s principle on which rest so many theorems of mathematical physics; today we establish it by reasonings very rigorous but very long; heretofore, on the contrary, we were content with a very summary proof. A certain integral depending on an arbitrary function can never vanish. Hence it is concluded that it must have a minimum. The flaw in this reasoning strikes us immediately, since we use the abstract term function and are familiar with all the singularities functions can present when the word is understood in the most general sense. But it would not be the same had we used concrete images, had we, for example, considered this function as an electric potential; it would have been thought legitimate to affirm that electrostatic equilibrium can be attained. Yet perhaps a physical comparison would have awakened some vague distrust. But if care had been taken to translate the reasoning into the language of geometry, intermediate between that of analysis and that of physics, doubtless this distrust would not have been produced, and perhaps one might thus, even today, still deceive many readers not forewarned.
…A first question presents itself. Is this evolution ended? Have we finally attained absolute rigour? At each stage of the evolution our fathers also thought they had reached it. If they deceived themselves, do we not likewise cheat ourselves?
We believe that in our reasonings we no longer appeal to intuition; the philosophers will tell us this is an illusion. Pure logic could never lead us to anything but tautologies; it could create nothing new; not from it alone can any science issue. In one sense these philosophers are right; to make arithmetic, as to make geometry, or to make any science, something else than pure logic is necessary.
Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:15
If Newton fooled himself, he evidently took with him a succession of readers more than 250 years later. Yet even they should feel no embarrassment. As Augustus De Morgan once wrote, “Everyone makes errors in probabilities, at times, and big ones.” (Graves, 1889, page 459)
“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”
Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.
The prevalence of case 1 might lead us to be very pessimistic; case 1, case 2, what’s the difference? We have demonstrated a large error rate in mathematics (and physics is probably even worse off). Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming (1998) attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”. (WP mentions as well that “His first mathematics publication was written…after he discovered an incorrect proof in another paper.”) Gian-Carlo Rota gives us an example with Hilbert:
Once more let me begin with Hilbert. When the Germans were planning to publish Hilbert’s collected papers and to present him with a set on the occasion of one of his later birthdays, they realized that they could not publish the papers in their original versions because they were full of errors, some of them quite serious. Thereupon they hired a young unemployed mathematician, Olga Taussky-Todd, to go over Hilbert’s papers and correct all mistakes. Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties. At last, on Hilbert’s birthday, a freshly printed set of Hilbert’s collected papers was presented to the Geheimrat. Hilbert leafed through them carefully and did not notice anything.17
So only one of those papers was irreparable, while all the others were correct and fixable? Rota himself experienced this:
Now let us shift to the other end of the spectrum, and allow me to relate another personal anecdote. In the summer of 1979, while attending a philosophy meeting in Pittsburgh, I was struck with a case of detached retinas. Thanks to Joni’s prompt intervention, I managed to be operated on in the nick of time and my eyesight was saved. On the morning after the operation, while I was lying on a hospital bed with my eyes bandaged, Joni dropped in to visit. Since I was to remain in that Pittsburgh hospital for at least a week, we decided to write a paper. Joni fished a manuscript out of my suitcase, and I mentioned to her that the text had a few mistakes which she could help me fix. There followed twenty minutes of silence while she went through the draft. “Why, it is all wrong!” she finally remarked in her youthful voice. She was right. Every statement in the manuscript had something wrong. Nevertheless, after laboring for a while, she managed to correct every mistake, and the paper was eventually published.
There are two kinds of mistakes. There are fatal mistakes that destroy a theory; but there are also contingent ones, which are useful in testing the stability of a theory.
- The result of Problem 11 contradicts the results announced by Levy [1963b]. Unfortunately, the construction presented there cannot be completed.
- The transfer to ZF was also claimed by Marek  but the outlined method appears to be unsatisfactory and has not been published.
- A contradicting result was announced and later withdrawn by Truss .
- The example in Problem 22 is a counterexample to another condition of Mostowski, who conjectured its sufficiency and singled out this example as a test case.
- The independence result contradicts the claim of Felgner  that the Cofinality Principle implies the Axiom of Choice. An error has been found by Morris (see Felgner’s corrections to ).
And referred me also to the entries in the index of Fourier Analysis by Tom Körner concerning the problem of the “pointwise convergence of Fourier series”:
The (in)famous Jacobian Conjecture was considered a theorem since a 1939 publication by Keller (who claimed to prove it). Then Shafarevich found a new proof and published it in some conference proceedings paper (in early 1950-ies). This conjecture states that any polynomial map from C^2 to C^2 is invertible if its Jacobian is nowhere zero. In 1960-ies, Vitushkin found a counterexample to all the proofs known to date, by constructing a complex analytic map, not invertible and with nowhere vanishing Jacobian. It is still a main source of embarrassment for Arxiv.org contributors, who publish about 3–5 false proofs yearly. Here is a funny refutation for one of the proofs: “Comment on a Paper by Yucai Su On Jacobian Conjecture (Dec 30, 2005)”
The problem of Jacobian Conjecture is very hard. Perhaps it will take human being another 100 years to solve it. Your attempt is noble, Maybe the Gods of Olympus will smile on you one day. Do not be too disappointed. B. Sagre has the honor of publishing three wrong proofs and C. Chevalley mistakes a wrong proof for a correct one in the 1950’s in his Math Review comments, and I.R. Shafarevich uses Jacobian Conjecture (to him it is a theorem) as a fact…
This look into the proverbial sausage factory should not come as a surprise to anyone taking an Outside View: why wouldn’t we expect any area of intellectual endeavour to have error rates within a few orders of magnitude as any other area? How absurd to think that the rate might be ~0%; but it’s also a little questionable to be as optimistic as Anders Sandberg’s mathematician friend: “he responded that he thought a far smaller number [1%] of papers in math were this flawed.”
Other times, the correct result is known and proven, but many are unaware of the answers19. The famous Millennium Problems—those that have been solved, anyway—have a long history of failed proofs (Fermat surely did not prove Fermat’s Last Theorem & may have realized this only after boasting20 and neither did Lindemann21). What explains this? The guiding factor that keeps popping up when mathematicians make leaps seems to go under the name of ‘elegance’ or mathematical beauty, which widely considered important222324. This imbalance suggests that mathematicians are quite correct when they say proofs are not the heart of mathematics and that they possess insight into math, a 6th sense for mathematical truth, a nose for aesthetic beauty which correlates with veracity: they disproportionately go after theorems rather than their negations.
Why this is so, I do not know.
Outright Platonism like Godel apparently believed in seems unlikely—mathematical expertise resembles a complex skill like chess-playing more than it does a sensory modality like vision. Possibly they have well-developed heuristics and short-cuts and they focus on the subsets of results on which those heuristics work well (the drunk searching under the spotlight), or perhaps they do run full rigorous proofs but are doing so subconsciously and merely express themselves ineptly consciously with omissions and erroneous formulations ‘left as an exercise for the reader’25.
We could try to justify the heuristic paradigm by appealing to as-yet poorly understood aspects of the brain, like our visual cortex: argue that what is going on is that mathematicians are subconsciously doing tremendous amounts of computation (like we do tremendous amounts of computation in a thought as ordinary as recognizing a face), which they are unable to bring up explicitly. So after prolonged introspection and some comparatively simple explicit symbol manipulation or thought, they feel that a conjecture is true and this is due to a summary of said massive computations.
Perhaps they are checking many instances? Perhaps they are white-box testing and looking for boundaries? Could there be some sort of “logical probability” where going down possible proof-paths yield probabilistic information about the final target theorem, maybe in some sort of Monte Carlo tree search of proof-trees? Do sleep serve to consolidate & prune & replay memories of incomplete lines of thought, finetuning heuristics or intuitions for future attacks and getting deeper into a problem (perhaps analogous to expert iteration)? Reading great mathematicians like Terence Tao discuss the heuristics they use on unsolved problems26, they bear some resemblances to computer science techniques. This would be consistent with a preliminary observation about how long it takes to solve mathematical conjectures: while inference is rendered difficult by the exponential growth in the global population and of mathematicians, the distribution of time-to-solution roughly matches a memoryless exponential distribution (one with a constant chance of solving it in any time period) rather than a more intuitive distribution like a type 1 survivorship curve (where a conjecture gets easier to solve over time, perhaps as related mathematical knowledge accumulates), suggesting a model of mathematical activity in which many independent random attempts are made, each with a small chance of success, and eventually one succeeds. This idea of extensive unconscious computation neatly accords with Poincaré’s account of mathematical creativity in which after long fruitless effort (preparation), he abandoned the problem for a time and engaged in ordinary activities (incubation), is suddenly struck by an answer or insight, and then verifies its correctness consciously. The existence of an incubation effect seems confirmed by psychological studies and particular the observation that incubation effects increase with the time allowed for incubation & also if the subject does not undertake demanding mental tasks during the incubation period (see Sio & Ormerod 2009), and is consistent with extensive unconscious computation. Some of this computation may happen during sleep; sleep & cognition have long been associated in a murky fashion (“sleep on it”), but it may have to do with reviewing the events of the day & difficult tasks, with relevant memories reinforced or perhaps more thinking going on. I’ve seen more than one suggestion of this, and mathematician Richard K. Guy suggests this as well.27 (It’s unclear how many results occur this way; Stanislaw Ulam mentions finding one result but never again28; J Thomas mentions one success but one failure by a teacher29; R. W. Thomason dreamed of a dead friend making a clearly false claim and published material based on his disproof of the ghost’s claim30; and Leonard Eugene Dickson reportedly had a useful dream & an early survey of 69 mathematicians yielded 63 nulls, 5 low-quality results, and 1 hit31.)
Heuristics, however, do not generalize, and fail outside their particular domain. Are we fortunate enough that the domain mathematicians work in is—deliberately or accidentally—just that domain in which their heuristics/intuition succeeds? Sandberg suggests not:
Unfortunately I suspect that the connoisseurship of mathematicians for truth might be local to their domain. I have discussed with friends about how “brittle” different mathematical domains are, and our consensus is that there are definitely differences between logic, geometry and calculus. Philosophers also seem to have a good nose for what works or doesn’t in their domain, but it doesn’t seem to carry over to other domains. Now moving outside to applied domains things get even trickier. There doesn’t seem to be the same “nose for truth” in risk assessment, perhaps because it is an interdisciplinary, messy domain. The cognitive abilities that help detect correct decisions are likely local to particular domains, trained through experience and maybe talent (ie. some conformity between neural pathways and deep properties of the domain). The only thing that remains is general-purpose intelligence, and that has its own limitations.
Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1⁄332 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures33.
We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg. Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages34.
The doom, however, did not manifest and arguably operating systems & applications are more reliable in the 2000s+ than they were in the 1980–1990s35 (eg. the general disappearance of the Blue Screen of Death). Users may not appreciate this point, but programmers who happen to think one day of just how the sausage of Gmail is made—how many interacting technologies and stacks of formats and protocols are involved—may get the shakes and wonder how it could ever work, much less be working at that moment. The answer is not really clear: it seems to be a combination of abundant computing resources driving down per-line error rates by avoiding optimization, modularization reducing interactions between lines, greater use of testing invoking an adversarial attitude to one’s code, and a light sprinkling of formal methods & static checks36.
While hopeful, it’s not clear how many of these would apply to existential risks: how does one use randomized testing on theories of, or tradeoff code clarity for computing performance?
So we might forgive case 1 errors entirely: if a community of mathematicians take an ‘incorrect’ proof about a particular 37, then we may be more confident in it38 than lumping the two error rates together. Case 2 errors are the problem, and they can sometimes be systematic. Most dramatically, when an entire group of papers with all their results turn out to be wrong since they made a since-disproved assumption:and ratify it (either by verifying the proof subconsciously or seeing what their heuristics say), it not being written out because it would be tedious too
In the 1970s and 1980s, mathematicians discovered that framed manifolds with Arf-Kervaire invariant equal to 1—oddball manifolds not surgically related to a sphere—do in fact exist in the first five dimensions on the list: 2, 6, 14, 30 and 62. A clear pattern seemed to be established, and many mathematicians felt confident that this pattern would continue in higher dimensions…Researchers developed what Ravenel calls an entire “cosmology” of conjectures based on the assumption that manifolds with Arf-Kervaire invariant equal to 1 exist in all dimensions of the form 2n − 2. Many called the notion that these manifolds might not exist the “Doomsday Hypothesis,” as it would wipe out a large body of research. Earlier this year, Victor Snaith of the University of Sheffield in England published a book about this research, warning in the preface, “…this might turn out to be a book about things which do not exist.”
Just weeks after Snaith’s book appeared, Hopkins announced on April 21 that Snaith’s worst fears were justified: that Hopkins, Hill and Ravenel had proved that no manifolds of Arf-Kervaire invariant equal to 1 exist in dimensions 254 and higher. Dimension 126, the only one not covered by their analysis, remains a mystery. The new finding is convincing, even though it overturns many mathematicians’ expectations, Hovey said.39
The parallel postulate is another fascinating example of mathematical error of the second kind; its history is replete with false proofs even by greats like Lagrange (on what strike the modern reader as bizarre grounds)40, self-deception, and misunderstandings—Giovanni Girolamo Saccheri developed a non-Euclidean geometry flawlessly but concluded it was flawed:
The second possibility turned out to be harder to refute. In fact he was unable to derive a logical contradiction and instead derived many non-intuitive results; for example that triangles have a maximum finite area and that there is an absolute unit of length. He finally concluded that: “the hypothesis of the acute angle is absolutely false; because it is repugnant to the nature of straight lines”. Today, his results are theorems of hyperbolic geometry.
We could look upon Type II errors as having a benevolent aspect: they show both that our existing methods are too weak & informal and that our intuition/heuristics break down at it—implying that all previous mathematical effort has been systematically misled in avoiding that area (as empty), and that there is much low-hanging fruit. (Consider how many scores or hundreds of key theorems were proven by the very first mathematicians to work in the non-Euclidean geometries!)
Should such widely-believed conjectures as P ≠ NP41 or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, entire textbook chapters (and perhaps textbooks) would disappear—and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining: it is not what you don’t know that’s dangerous, but what you know that ain’t so.
“A credo of sorts”; Vaughan Jones (Truth in Mathematics, 1998), pg208–209:
Proofs are indispensable, but I would say they are necessary but not sufficient for mathematical truth, at least truth as perceived by the individual.
To justify this attitude let me invoke two experiences of current mathematics, which very few mathematicians today have escaped.
The first is computer programming. To write a short program, say 100 lines of C code, is a relatively painless experience. The debugging will take longer than the writing, but it will not entail suicidal thoughts. However, should an inexperienced programmer undertake to write a slightly longer program, say 1000 lines, distressing results will follow. The debugging process becomes an emotional nightmare in which one will doubt one’s own sanity. One will certainly insult the compiler in words that are inappropriate for this essay. The mathematician, having gone through this torture, cannot but ask: “Have I ever subjected the proofs of any of my theorems to such close scrutiny?” In my case at least the answer is surely “no”. So while I do not doubt that my proofs are correct (at least the important ones), my belief in the results needs bolstering. Compare this with the debugging process. At the end of debugging we are happy with our program because of the consistency of the output it gives, not because we feel we have proved it correct—after all we did that at least twenty times while debugging and we were wrong every time. Why not a twenty-first? In fact we are acutely aware that our poor program has only been tested with a limited set of inputs and we fully expect more bugs to manifest themselves when inputs are used which we have not yet considered. If the program is sufficiently important, it will be further debugged in the course of time until it becomes secure with respect to all inputs. (With much larger programs this will never happen.) So it is with our theorems. Although we may have proofs galore and a rich surrounding structure, if the result is at all difficult it is only the test of time that will cause acceptance of the “truth” of the result.
The second experience concerning the need for supplements to proof is one which I used to dislike intensely, but have come to appreciate and even search for. It is the situation where one has two watertight, well-designed arguments—that lead inexorably to opposite conclusions. Remember that research in mathematics involves a foray into the unknown. We may not know which of the two conclusions is correct or even have any feeling or guess. Proof at this point is our only arbiter. And it seems to have let us down. I have known myself to be in this situation for months on end. It induces obsessive and anti-social behaviour. Perhaps we have found an inconsistency in mathematics. But no, eventually some crack is seen in one of the arguments and it begins to look more and more shaky. Eventually we kick ourselves for being so utterly stupid and life goes on. But it was no tool of logic that saved us. The search for a chink in the armour often involved many tricks including elaborate thought experiments and perhaps computer calculations. Much structural understanding is created, which is why I now so value this process. One’s feeling of having obtained truth at the end is approaching the absolute. Though I should add that I have been forced to reverse the conclusion on occasions…
Examples like the ABC conjecture being the exceptions that prove the rule.↩︎
- Bell, E.T.: 1951, “The Queen of Mathematics”, reprinted in J. R. Newman (ed.), The World of Mathematics, Simon and Schuster (1956)
- De Milo, R. Lipton, and A. Perlis: 1979, “Social Processes and Proofs of Theorems and Programs”, Communication of the ACM, Vol. 22, No. 5. Reprinted in T. Tymozcko (ed.), New Directions in the Philosophy of Mathematics, Princeton University Press (1998). Page numbers refer to the book
As a pragmatist & empiricist, I must have the temerity to disagree with the likes of Plato about the role of proof: if mathematical proof truly was so reliable, then I would have little to write about in this essay. However rigorous logic is, it is still created & used by fallible humans. There is no ‘Platonia’ we can tap into to obtain transcendent truth.↩︎
There are various delusions (eg. Cotard delusion), false memory syndromes, compulsive lying (pseudologia fantastica), disorders provoking confabulation such as the general symptom of anosognosia; in a dramatic example of how the mind is what the brain does, some anosognosia can be temporarily cured by squirting cold water in an ear; from “The Apologist and the Revolutionary”:
Take the example of the woman discussed in Lishman’s Organic Psychiatry. After a right-hemisphere stroke, she lost movement in her left arm but continuously denied it. When the doctor asked her to move her arm, and she observed it not moving, she claimed that it wasn’t actually her arm, it was her daughter’s. Why was her daughter’s arm attached to her shoulder? The patient claimed her daughter had been there in the bed with her all week. Why was her wedding ring on her daughter’s hand? The patient said her daughter had borrowed it. Where was the patient’s arm? The patient “turned her head and searched in a bemused way over her left shoulder”…In any case, a patient who has been denying paralysis for weeks or months will, upon having cold water placed in the ear, admit to paralysis, admit to having been paralyzed the past few weeks or months, and express bewilderment at having ever denied such an obvious fact. And then the effect wears off, and the patient not only denies the paralysis but denies ever having admitted to it.
“Mathematics in the Age of the Turing Machine”, Hales 2014:
As an example, we will calculate the expected number of soft errors in one of the mathematical calculations of Section 1.17. The Atlas Project calculation of the E8 character table was a 77 hour calculation that required 64 gigabytes RAM [Ada07]. Soft errors rates are generally measured in units of failures-in-time (FIT). One FIT is defined as one error per 109 hours of operation. If we assume a soft error rate of 103 FIT per Mbit, (which is a typical rate for a modern memory device operating at sea level 15 [Tez04]), then we would expect there to be about 40 soft errors in memory during the calculation:
This example shows that soft errors can be a realistic concern in mathematical calculations. (As added confirmation, the E8 calculation has now been repeated about 5 times with identical results.)…The soft error rate is remarkably sensitive to elevation; a calculation in Denver produces about three times more soft errors than the same calculation on identical hardware in Boston…Soft errors are depressing news in the ultra-reliable world of proof assistants. Alpha particles rain on perfect and imperfect software alike. In fact, because the number of soft errors is proportional to the execution time of a calculation, by being slow and methodical, the probability of a soft error during a calculation inside a proof assistant can be much higher than the probability when done outside.
If a contradiction were now actually found in arithmetic—that would only prove that an arithmetic with such a contradiction in it could render very good service; and it would be better for us to modify our concept of the certainty required, than to say it would really not yet have been a proper arithmetic.
Saul Kripke, reconstructing a Wittgensteinian skeptical argument, points out one way to react to such issues:
A skeptical solution of a philosophical problem begins… by conceding that the skeptic’s negative assertions are unanswerable. Nevertheless our ordinary practice or belief is justified because-contrary appearances notwithstanding-it need not require the justification the sceptic has shown to be untenable. And much of the value of the sceptical argument consists precisely in the fact that he has shown that an ordinary practice, if it is to be defended at all, cannot be defended in a certain way.
- the transcendality of 2√2 and eπ: resolved as predicted, but >78 years faster than he predicted.
- proof of the consistency of arithmetic: prediction that arithmetic was consistent and this was provable was falsified (Goedel showing it is unprovable)
One could add to this Hilbert list: the continuum hypothesis (independent); and the algorithm for solving Diophantines (impossible to give, to the surprise of Georg Kreisel who said reviewing one of the papers “Well, that’s not the way it’s gonna go.”). From MathOverflow:
Hilbert’s 21st problem, on the existence of linear DEs with prescribed monodromy group, was for a long time thought to have been solved by Plemelj in 1908. In fact, Plemelj died in 1967 still believing he had solved the problem. However, in 1989, Bolibruch discovered a counterexample. Details are in the book The Riemann-Hilbert Problem by Anosov and Bolibruch (Vieweg-Teubner 1994), and a nice popular recounting of the story is in Ben Yandell’s The Honors Class (A K Peters 2002).
Lipton also provides as examples:
Warren Hirsch’s polytope conjecture
Subhash Khot’s conjecture that his Unique Games problem is NP-hard (not falsified but substantially weakened)
the search for a proof of Euclid’s fifth postulate (covered already)
George Pólya’s prime factorization conjecture
Euler’s generalization of Fermat’s last theorem
Virginia Ragsdale’s combinatorial conjecture, related to a Hilbert problem
Erik Zeeman’s knot-tying conjecture; the resolution is too good to not quote:
After trying to prove this for almost ten years, one day he worked on the opposite direction, and solved it in hours.
a von Neumann topological conjecture
conventional wisdom in complexity theory “that bounded-width programs could not compute the majority function, and many other functions”
ditto, “Most believed that nondeterministic logspace (NLOG) is not closed under complement.”
Béla Julesz’s human vision statistics conjecture
John “The Mathematician” (1947):,
That Euclid’s axiomatization does at some minor points not meet the modern requirements of absolute axiomatic rigour is of lesser importance in this respect…The first formulations of the calculus were not even mathematically rigorous. An inexact, semi-physical formulation was the only one available for over a hundred and fifty years after Newton! And yet, some of the most important advances of analysis took place during this period, against this inexact, mathematically inadequate background! Some of the leading mathematical spirits of the period were clearly not rigorous, like Euler; but others, in the main, were, like Gauss or Jacobi. The development was as confused and ambiguous as can be, and its relation to empiricism was certainly not according to our present (or Euclid’s) ideas of abstraction and rigour. Yet no mathematician would want to exclude it from the fold—that period produced mathematics as first-class as ever existed! And even after the reign of rigour was essentially re-established with Cauchy, a very peculiar relapse into semi-physical methods took place with Riemann.
Stephen Wolfram mentions a recent example I hadn’t run into used before, in a long discussion of expanding Mathematica to automatically incorporate old papers’ results
Of course, there are all sorts of practical issues. Newer papers are predominantly in TeX, so it’s not too difficult to pull out theorems with all their mathematical notation. But older papers need to be scanned, which requires math OCR, which has yet to be properly developed. Then there are issues like whether theorems stated in papers are actually valid. And even whether theorems that were considered valid, say, 100 years ago are still considered valid today. For example, for continued fractions, there are lots of pre-1950 theorems that were successfully proved in their time, but which ignore branch cuts, and so wouldn’t be considered correct today. And in the end of course it requires lots of actual, skilled mathematicians to guide the curation process, and to encode theorems. But in a sense this kind of mobilization of mathematicians is not completely unfamiliar; it’s something like what was needed when Zentralblatt was started in 1931, or Mathematical Reviews in 1941.
The history of mathematics is full of philosophically and ethically troubling reports about bad proofs of theorems. For example, the fundamental theorem of algebra states that every polynomial of degree n with complex coefficients has exactly n complex roots. D’Alembert published a proof in 1746, and the theorem became known “D’Alembert’s theorem”, but the proof was wrong. Gauss published his first proof of the fundamental theorem in 1799, but this, too, had gaps. Gauss’s subsequent proofs, in 1816 and 1849, were OK. It seems to have been hard to determine if a proof of the fundamental theorem of algebra was correct. Why?
Poincaré was awarded a prize from King Oscar II of Sweden and Norway for a paper on the three-body problem, and his paper was published in Acta Mathematica in 1890. But the published paper was not the prize-winning paper. The paper that won the prize contained serious mistakes, and Poincare and other mathematicians, most importantly, Mittag-Leffler, engaged in a conspiracy to suppress the truth and to replace the erroneous paper with an extensively altered and corrected one.
The three-body problem is fascinating as it gives us an example of a bad proof by Poincaré & attempt to cover it up, but also an example of a impossibility proof: Bruns & Poincaré proved in 1887 that the usual approaches could not work, typically interpreted as the 3 or n-body problem being unsolvable. Except in 1906/1909, Karl F. Sundman provided an (impractical) algorithm using different techniques to solve it. See “The Solution of the n-body Problem” & “A Visit to the Newtonian N-body Problem via Elementary Complex Variables”.↩︎
“Mathematics in the Age of the Turing Machine”, Hales 2014:
Why use computers to verify mathematics? The simple answer is that carefully implemented proof checkers make fewer errors than mathematicians (except J.-P. Serre). Incorrect proofs of correct statements are so abundant that they are impossible to catalogue. Ralph Boas, former executive editor of Math Reviews, once remarked that proofs are wrong “half the time” [Aus08]. Kempe’s claimed proof of the four-color theorem stood for more than a decade before Heawood refuted it [Mac01, p. 115]. “More than a thousand false proofs [of Fermat’s Last Theorem] were published between 1908 and 1912 alone” [Cor10]. Many published theorems are like the hanging chad ballots of the 2000 U.S. presidential election, with scrawls too ambivalent for a clear yea or nay. One mathematician even proposed to me that a new journal is needed that unlike the others only publishes reliable results. Euclid gave us a method, but even he erred in the proof of the very first proposition of the Elements when he assumed without proof that two circles, each passing through the other’s center, must intersect. The concept that is needed to repair the gap in Euclid’s reasoning is an intermediate value theorem. This defect was not remedied until Hilbert’s Foundations of Geometry. Examples of widely accepted proofs of false or unprovable statements show that our methods of proof-checking are far from perfect. Lagrange thought he had a proof of the parallel postulate, but had enough doubt in his argument to withhold it from publication. In some cases, entire schools have become sloppy, such as the Italian school of algebraic geometry or real analysis before the revolution in rigor towards the end of the nineteenth century. Plemelj’s 1908 accepted solution to Hilbert’s 21st problem on the monodromy of linear differential equations was refuted in 1989 by Bolibruch. Auslander gives the example of a theorem12 published by Waraskiewicz in 1937, generalized by Choquet in 1944, then refuted with a counterexample by Bing in 1948 [Aus08]. Another example is the approximation problem for Sobolev maps between two manifolds [Bet91], which contains a faulty proof of an incorrect statement. The corrected theorem appears in [HL03]. Such examples are so plentiful that a Wiki page has been set up to classify them, with references to longer discussions at Math Overflow [Wik11], [Ove09], [Ove10].
“Computational Discovery in Pure Mathematics”, Simon Colton 2007:
A more recent example was the discovery that Andrew Wiles’ original proof of Fermat’s Last Theorem was flawed (but not, as it turned out, fatally flawed, as Wiles managed to fix the problem (Singh, 1997))…More recently, Larry Wos has been using Otter to find smaller proofs of theorems than the current ones. To this end, he uses Otter to find more succinct methods than those originally proposed. This often results in detecting double negations and removing unnecessary lemmas, some of which were thought to be indispensable. (Wos, 1996) presents a methodology using a strategy known as resonance to search for elegant proofs with Otter. He gives examples from mathematics and logic, and also argues that this work also implications for other fields such as circuit design.
(Fleuriot & Paulson, 1998) have studied the geometric proofs in Newton’s Principia and investigated ways to prove them automatically with the Isabelle interactive theorem prover (Paulson, 1994). To do this, they formalized the Principia in both Euclidean geometry and non-standard analysis. While working through one of the key results (proposition 11 of book 1, the Kepler problem) they discovered an anomaly in the reasoning. Newton was appealing to a cross-multiplication result which wasn’t true for infinitesimals or infinite numbers. Isabelle could therefore not prove the result, but Fleuriot managed to derive an alternative proof of the theorem that the system found acceptable.
Colton 2007: “For example, Heawood discovered a flaw in Kempe’s 1879 proof of the four colour theorem,2 which had been accepted for 11 years.” It would ultimately be proved with a computer in 1976—maybe.↩︎
Theorems that are calculations or enumerations are especially prone to error. Feynman laments, “I don’t notice in the morass of things that something, a little limit or sign, goes wrong… . . I have mathematically proven to myself so many things that aren’t true” [Fey00, p. 885]. Elsewhere, describes two teams of physicists who carried out a two-year calculation of the electron magnetic moment and independently arrived at the same predicted value. When experiment disagreed with prediction, the discrepancy was eventually traced to an arithmetic error made by the physicists, whose calculations were not so independent as originally believed [Fey85, p. 117]. Pontryagin and Rokhlin erred in computing stable homotopy groups of spheres. Little’s tables of knots from 1885 contains duplicate entries that went undetected until 1974. In enumerative geometry, in 1848, Steiner counted 7776 plane conics tangent to 5 general plane conics, when there are actually only 3264. One of the most persistent blunders in the history of mathematics has been the misclassification (or misdefinition) of convex Archimedean polyhedra. Time and again, the pseudo rhombic cuboctahedron has been overlooked or illogically excluded from the classification (Figure 21) [Grue11].
“Fine Hall in its golden age: Remembrances of Princeton in the early fifties”, Indiscrete Thoughts↩︎
There are 2 20th century mathematicians, born too late to work with Faraday, and the telegraph inventor Samuel Morse who while overlapping with Faraday, has a Wikipedia entry mentioning no work in mathematics; I do not know which Morse may be meant.↩︎
An example of this would be “An Enduring Error”, Branko Grünbaum:
Mathematical truths are immutable, but mathematicians do make errors, especially when carrying out non-trivial enumerations. Some of the errors are “innocent”—plain mistakes that get corrected as soon as an independent enumeration is carried out. For example, Daublebsky  in 1895 found that there are precisely 228 types of configurations (123), that is, collections of 12 lines and 12 points, each incident with three of the others. In fact, as found by Gropp  in 1990, the correct number is 229. Another example is provided by the enumeration of the uniform tilings of the 3-dimensional space by Andreini  in 1905; he claimed that there are precisely 25 types. However, as shown  in 1994, the correct number is 28. Andreini listed some tilings that should not have been included, and missed several others—but again, these are simple errors easily corrected…It is surprising how errors of this type escape detection for a long time, even though there is frequent mention of the results. One example is provided by the enumeration of 4-dimensional simple polytopes with 8 facets, by Brückner  in 1909. He replaces this enumeration by that of 3-dimensional “diagrams” that he interpreted as Schlegel diagrams of convex 4-polytopes, and claimed that the enumeration of these objects is equivalent to that of the polytopes. However, aside from several “innocent” mistakes in his enumeration, there is a fundamental error: While to all 4-polytopes correspond 3-dimensional diagrams, there is no reason to assume that every diagram arises from a polytope. At the time of Brückner’s paper, even the corresponding fact about 3-polyhedra and 2-dimensional diagrams has not yet been established—this followed only from Steinitz’s characterization of complexes that determine convex polyhedra , . In fact, in the case considered by Brückner, the assumption is not only unjustified, but actually wrong: One of Brückner’s polytopes does not exist, see .
…Polyhedra have been studied since antiquity. It is, therefore, rather surprising that even concerning some of the polyhedra known since that time there is a lot of confusion, regarding both terminology and essence. But even more unexpected is the fact that many expositions of this topic commit serious mathematical and logical errors. Moreover, this happened not once or twice, but many times over the centuries, and continues to this day in many printed and electronic publications; the most recent case is in the second issue for 2008 of this journal…With our understandings and exclusions, there are fourteen convex polyhedra that satisfy the local criterion and should be called “Archimedean”, but only thirteen that satisfy the global criterion and are appropriately called “uniform” (or “semiregular”). Representatives of the thirteen uniform convex polyhedra are shown in the sources mentioned above, while the fourteenth polyhedron is illustrated in Figure 1. It satisfies the local criterion but not the global one, and therefore is—in our terminology—Archimedean but not uniform. The history of the realization that the local criterion leads to fourteen polyhedra will be discussed in the next section; it is remarkable that this development occurred only in the 20th century. This implies that prior to the twentieth century all enumerations of the polyhedra satisfying the local criterion were mistaken. Unfortunately, many later enumerations make the same error.
Dana Mackinzie, The Universe in Zero Words: The Story of Mathematics as Told through Equations (as quoted by John D. Cook):
Fermat repeatedly bragged about the n = 3 and n = 4 cases and posed them as challenges to other mathematicians … But he never mentioned the general case, n = 5 and higher, in any of his letters. Why such restraint? Most likely, Weil argues, because Fermat had realized that his “truly wonderful proof” did not work in those cases…Every mathematician has had days like this. You think you have a great insight, but then you go out for a walk, or you come back to the problem the next day, and you realize that your great idea has a flaw. Sometimes you can go back and fix it. And sometimes you can’t.
Much additional progress was made over the next 150 years, but no completely general result had been obtained. Buoyed by false confidence after his proof that pi is transcendental, the mathematician Lindemann proceeded to publish several proofs of Fermat’s Last Theorem, all of them invalid (Bell 1937, pp. 464–465). A prize of 100000 German marks, known as the Wolfskehl Prize, was also offered for the first valid proof (Ball and Coxeter 1987, p. 72; Barner 1997; Hoffman 1998, pp. 193–194 and 199).
A recent false alarm for a general proof was raised by Y. Miyaoka (Cipra 1988) whose proof, however, turned out to be flawed. Other attempted proofs among both professional and amateur mathematicians are discussed by vos Savant (1993), although vos Savant erroneously claims that work on the problem by Wiles (discussed below) is invalid.
To take a random example (which could be multiplied indefinitely); from Gödel and the Nature of Mathematical Truth: A Talk with Rebecca Goldstein (6.8.2005):
Einstein told the philosopher of science Hans Reichenbach that he’d known even before the solar eclipse of 1918 supported his general theory of relativity that the theory must be true because it was so beautiful. And Hermann Weyl, who worked on both relativity theory and quantum mechanics, said “My work always tried to unite the true with the beautiful, but when I had to choose one or the other, I usually chose the beautiful.”…Mathematics seems to be the one place where you don’t have to choose, where truth and beauty are always united. One of my all-time favorite books is A Mathematician’s Apology. G.H. Hardy tries to demonstrate to a general audience that mathematics is intimately about beauty. He gives as examples two proofs, one showing that the square root of 2 is irrational, the other showing that there’s no largest prime number. Simple, easily graspable proofs, that stir the soul with wonder.
Nathanson 2009 claims the opposite:
Many mathematicians have the opposite opinion; they do not or cannot distinguish the beauty or importance of a theorem from its proof. A theorem that is first published with a long and difficult proof is highly regarded. Someone who, preferably many years later, finds a short proof is “brilliant.” But if the short proof had been obtained in the beginning, the theorem might have been disparaged as an “easy result.” Erdős was a genius at finding brilliantly simple proofs of deep results, but, until recently, much of his work was ignored by the mathematical establishment.
From “Aesthetics as a Liberating Force in Mathematics Education?”, by Nathalie Sinclair (reprinted in The Best Writing on Mathematics 2010, ed. Mircea Pitici); pg208:
There is a long tradition in mathematics of describing proofs and theorems in aesthetic terms, often using words such as ‘elegance’ and ‘depth’. Further, mathematicians have also argued that their subject is more akin to an art than it is to a science (see Hardy, 1967; Littlewood, 1986; Sullivan 1925/1956), and, like the arts, ascribe to mathematics aesthetic goals. For example, the mathematician W. Krull (1930 / 1987) writes: “the primary goals of the mathematician are aesthetic, and not epistemological” (p. 49). This statement seems contradictory with the oft-cited concern of mathematics with finding or discovering truths, but it emphasises the fact that the mathematician’s interest is in expressing truth, and in doing so in clever, simple, succinct ways.
While Krull focuses on mathematical expression, the mathematician H. Poincare (1908 / 1966) concerns himself with the psychology of mathematical invention, but he too underlines the aesthetic dimension of mathematics, not the logical. In Poincare’s theory, a large part of a mathematician’s work is done at the subconscious level, where an aesthetic sensibility is responsible for alerting the mathematicians to the most fruitful and interesting of ideas. Other mathematicians have spoken of this special sensibility as well and also in terms of the way it guides mathematicians to choose certain problems. This choice is essential in mathematic given that there exists no external reality against which mathematicians can decide which problems or which branches of mathematics are important (see , 1947 [“The Mathematician”]): the choice involves human values and preference—and, indeed, these change over time, as exemplified by the dismissal of geometry by some prominent mathematicians in the early 20th century (see Whiteley, 1999).
- Littlewood, 1986: “The mathematician’s art of work”; in B. Bollobas (ed.), Littlewood’s miscellany, Cambridge University press
- Sullivan 1925/1956: “Mathematics as an art”; in J. Newman (ed.), The world of mathematics, vol 3 (p 2015–2021)
From pg 211–212, Sinclair 2009:
The survey of mathematicians conducted by Wells (1990) provides a more empirically-based challenge to the intrinsic view of the mathematical aesthetic. Wells obtained responses from over 80 mathematicians, who were asked to identify the most beautiful theorem from a given set of 24 theorems. (These theorems were chosen because they were ‘famous’, in the sense that Wells judged them to be well-known by most mathematicians, and of interest to the discipline in general, rather than to a particular subfield.) Wells finds that the mathematicians varied widely in their judgments. More interestingly, in explaining their choices, the mathematicians revealed a wide range of personal responses affecting their aesthetic responses to the theorems. Wells effectively puts to rest the belief that mathematicians have some kind of secret agreement on what counts as beautiful in mathematics…Burton’s (2004) work focuses on the practices of mathematicians and their understanding of those practices. Based on extensive interviews with a wide range of mathematicians…She points out that mathematicians range on a continuum from unimportant to crucial in terms of their positionings on the role of the aesthetic, with only 3 of the 43 mathematicians dismissing its importance. For example, one said “Beauty doesn’t matter. I have never seen a beautiful mathematical paper in my life” (p. 65). Another mathematician was initially dismissive about mathematical beauty but later, when speaking about the review process, said: “If it was a very elegant way of doing things, I would be inclined to forgive a lot of faults” (p. 65).
- Burton, Leone (2004): Mathematicians as enquirers: Learning about learning mathematics; Dordrecht: Kluwer Academic Publishers
Tao left a lengthy comment on a previously linked Lipton post:
It is possible to gather reasonably convincing support for a conjecture by a variety of means, long before it is actually proven, though many mathematicians are reluctant to argue too strongly based on such support due to the lack of rigour or the risk of embarrassment in hindsight. Examples of support include:
- Numerical evidence; but one has to be careful in situations where the null hypothesis would also give comparable numerical evidence. The first ten trillion zeroes of zeta on the critical line is, in my opinion, only mild evidence in favour of RH (the null hypothesis may be, for instance, that the zeroes go haywire once log log t becomes sufficiently large); but the numerical data on spacings of zeroes is quite convincing evidence for the GUE hypothesis, in my view. (It is a priori conceivable that the spacings are distributed according to GUE plus another correction that dominates when log log t (say) is large, but this begins to strain Occam’s razor.)
- Non-trivial special cases. But it depends on how “representative” one believes the special cases to be. For instance, if one can verify low-dimensional cases of a conjecture that is true in high dimensions, this is usually only weak (but not entirely insignificant) evidence, as it is possible that there exist high-dimensional pathologies that sink the conjecture but cannot be folded up into a low-dimensional situation. But if one can do all odd-dimensional cases, and all even-dimensional cases up to dimension 8 (say), then that begins to look more convincing.
- Proofs of parallel, analogous, or similar conjectures. Particularly if these proofs were non-trivial and led to new insights and techniques. RH in function fields is a good example here; it raises the hope of some sort of grand unified approach to GRH that somehow handles all number fields (or some other general class) simultaneously.
- Converse of the conjecture is provable, and looks “optimal” somehow. One might be able to construct a list of all obvious examples of objects with property X, find significant difficulty extending the list, and then conjecture that this is list is complete. This is a common way to make conjectures, but can be dangerous, as one may simply have a lack of imagination. So this is thin evidence by itself (many false conjectures have arisen from this converse-taking method), but it does carry a little bit of weight once combined with other strands of evidence.
- Conjecture is ambitious and powerful, and yet is not immediately sunk by the obvious consistency checks. This is vaguely analogous to the concept of a “falsifiable theory” in science. A strong conjecture could have many powerful consequences in a variety of disparate areas of mathematics—so powerful, in fact, that one would not be surprised that they could be disproven with various counterexamples. But surprisingly, when one checks the cases that one does understand quite well, the conjecture holds up. A typical example here might include a very general conjectured identity which, when specialised to various well-understood special cases, become a provable identity—but with the identity in each special case being provable by very different methods, and the connection between all the identities being mysterious other than via the conjecture. The general conjecture that the primes behave pseudorandomly after accounting for small moduli is an example of such a conjecture; we usually can’t control how the primes behave, but when we can, the pseudorandomess heuristic lines up perfectly.
- Attempts at disproof run into interesting obstacles. This one is a bit hard to formalise, but sometimes you can get a sense that attempts to disprove a conjecture are failing not due to one’s own lack of ability, or due to accidental contingencies, but rather due to “enemy activity”; some lurking hidden structure to the problem, corners of which emerge every time one tries to build a counterexample. The question is then whether this “enemy” is stupid enough to be outwitted by a sufficiently clever counterexample, or is powerful enough to block all such attempts. Identifying this enemy precisely is usually the key to resolving the conjecture (or transforming the conjecture into a stronger and better conjecture).
- Conjecture generalises to a broader conjecture that enjoys support of the types listed above. The twin prime conjecture, by itself, is difficult to support on its own; but when it comes with an asymptotic that one can then verify numerically to high accuracy and is a consequence of the much more powerful prime tuples conjecture (and more generally, the pseudorandomness heuristic for the primes) which is supported both because of its high falsifiability and also its nature as an optimal-looking converse (the only structure to the primes are the “obvious” structures), it becomes much more convincing. Another textbook example is the Poincare conjecture, which became much more convincing after being interpreted as a special case of geometrisation (which had a lot of support, eg. the two-dimensional analogue, Haken manifolds, lots of falsifiable predictions, etc.).
It can be fun (though a little risky, reputation-wise) to debate how strong various pieces of evidence really are, but one soon reaches a point of diminishing returns, as often we are limited by our own ignorance, lack of imagination, or cognitive biases. But we are at least reasonably able to perform relative comparisons of the strength of evidence of two conjectures in the same topic (I guess complexity theory is full of instances of this…).
pg190–191 of Fascinating Mathematical People, edited by Albers 2011:
Guy: If I do any mathematics at all I think I do it in my sleep.
MP: Do you think a lot of mathematicians work that way?
Guy: I do. Yes. The human brain is a remarkable thing, and we are a long way from understanding how it works. For most mathematical problems, immediate thought and pencil and paper—the usual things one associates with solving mathematical problems—are just totally inadequate. You need to understand the problem, make a few symbols on paper and look at them. Most of us, as opposed to Erdős who would probably give an answer to a problem almost immediately, would then probably have to go off to bed, and, if we’re lucky, when we wake up in the morning, we would already have some insight into the problem. On those rare occasions when I have such insight, I quite often don’t know that I have it, but when I come to work on the problem again, to put pencil to paper, somehow the ideas just seem to click together, and the thing goes through. It is clear to me that my brain must have gone on, in an almost combinatorial way, checking the cases or doing an enormous number of fairly trivial arithmetical computations. It seems to know the way to go. I first noticed this with chess endgames, which are indeed finite combinatorial problems. The first indication that I was interested in combinatorics—I didn’t know I had the interest, and I didn’t even know there was such a subject as combinatorics—was that I used to compose chess endgames. I would sit up late into the night trying to analyze a position. Eventually I would sink into slumber and wake up in the morning to realize that if I had only moved the pawns over one file the whole thing would have gone through clearly. My brain must have been checking over this finite but moderately large number of possibilities during the night. I think a lot of mathematicians must work that way.
MP: Have you talked to any other mathematicians about that?
Guy: No. But in Jacques Hadamard’s book on invention in the mathematical field, he quotes some examples there where it is fairly clear that people do that kind of thing. There was someone earlier this week who was talking about Jean-Paul Serre. He said that if you ask Serre a question he either gives you the answer immediately, or, if he hesitates, and you push him in any way, he will say, “How can I think about the question when I don’t know the answer?” I thought that was a lovely remark. At a much lower level, one should think, “What shape should the answer be?” Then your mind can start checking whether you’re right and how to find some logical sequence to get you where you want to go.
Once in my life I had a mathematical dream which proved correct. I was twenty years old. I thought, my God, this is wonderful, I won’t have to work, it will all come in dreams! But it never happened again.
Once after I had spent several days trying to prove a topology theorem, I dreamed about it and woke up with as counterexample. In the dream it just constructed itself, and I could see it. I didn’t have a fever then, though. Later one of my teachers, an old Polish woman, explained her experience. She kept a notebook by her bed so she could write down any insights she got in her sleep. She woke up in the night with a wonderful proof, and wrote it down, and in the morning when she looked at it it was all garbage. “You cannot do math in your sleep. You will have to work.”
“Higher algebraic K-theory of schemes and of derived categories”, Thomason & Trobaugh 1990:
The first author must state that his coauthor and close friend, Tom Trobaugh, quite intelligent, singularly original, and inordinately generous, killed himself consequent to endogenous depression. 94 days later, in my dream, Tom’s simulacrum remarked, “The direct limit characterization of perfect complexes shows that they extend, just as one extends a coherent sheaf.” Awaking with a start, I knew this idea had to be wrong, since some perfect complexes have a non-vanishing K0 obstruction to extension. I had worked on this problem for 3 years, and saw this approach to be hopeless. But Tom’s simulacrum had been so insistent, I knew he wouldn’t let me sleep undisturbed until I had worked out the argument and could point to the gap. This work quickly led to the key results of this paper. To Tom, I could have explained why he must be listed as a coauthor.
Let us come to mathematicians. One of them, Maillet, started a first inquiry as to their methods of work. One famous question, in particular, was already raised by him that of the “mathematical dream”, it having been suggested often that the solution of problems that have defied investigation may appear in dreams. Though not asserting the absolute non-existence of “mathematical dreams”, Maillet’s inquiry shows that they cannot be considered as having a serious importance. Only one remarkable observation is reported by the prominent American mathematician, Leonard Eugene Dickson, who can positively assert its accuracy…Except for that very curious case, most of the 69 correspondents who answered Maillet on that question never experienced any mathematical dream (I never did) or, in that line, dreamed of wholly absurd things, or were unable to state precisely the question they happened to dream of. 5 dreamed of quite naive arguments. There is one more positive answer; but it is difficult to take account of it, as its author remains anonymous.
From his 1993 “How to Write a Proof”:
Anecdotal evidence suggests that as many as a third of all papers published in mathematical journals contain mistakes—not just minor errors, but incorrect theorems and proofs…My information about mathematicians’ errors and embarrassment comes mainly from George Bergman.
1993 “How to Write a Proof”:
Some twenty years ago, I decided to write a proof of the Schroeder-Bernstein theorem for an introductory mathematics class. The simplest proof I could find was in Kelley’s classic general topology text [4, page 28]. Since Kelley was writing for a more sophisticated audience, I had to add a great deal of explanation to his half-page proof. I had written five pages when I realized that Kelley’s proof was wrong. Recently, I wanted to illustrate a lecture on my proof style with a convincing incorrect proof, so I turned to Kelley. I could find nothing wrong with his proof; it seemed obviously correct! Reading and rereading the proof convinced me that either my memory had failed, or else I was very stupid twenty years ago. Still, Kelley’s proof was short and would serve as a nice example, so I started rewriting it as a structured proof. Within minutes, I rediscovered the error.
My interest in proofs stems from writing correctness proofs of algorithms. These proofs are seldom deep, but usually have considerable detail. Structured proofs provided a way of coping with this detail. The style was first applied to proofs of ordinary theorems in a paper I wrote with Martin Abadi . He had already written conventional proofs|proofs that were good enough to convince us and, presumably, the referees. Rewriting the proofs in a structured style, we discovered that almost every one had serious mistakes, though the theorems were correct. Any hope that incorrect proofs might not lead to incorrect theorems was destroyed in our next collaboration . Time and again, we would make a conjecture and write a proof sketch on the blackboard—a sketch that could easily have been turned into a convincing conventional proof—only to discover, by trying to write a structured proof, that the conjecture was false. Since then, I have never believed a result without a careful, structured proof. My skepticism has helped avoid numerous errors.
“How to Write a 21st Century Proof”, Lamport 2011:
My earlier paper on structured proofs described how effective they are at catching errors. It recounted how only by writing such a proof was I able to re-discover an error in a proof of the Schroeder-Bernstein theorem in a well-known topology text [2, page 28]. I recently received email from a mathematician saying that he had tried unsuccessfully to find that error by writing a structured proof. I asked him to send me his proof, and he responded:
I tried typing up the proof that I’d hand-written, and in the process, I think I’ve found the fundamental error. . . I now really begin to understand what you mean about the power of this method, even if it did take me hours to get to this point!
It is instructive that, to find the error, he had to re-write his proof to be read by someone else. Eliminating errors requires care.
Twenty years ago it was reasonable to predict that the size and ambition of software products would be severely limited by the unreliability of their component programs. Crude estimates suggest that professionally written programs delivered to the customer can contain between one and ten independently correctable errors per thousand lines of code; and any software error in principle can have spectacular effect (or worse: a subtly misleading effect) on the behaviour of the entire system. Dire warnings have been issued..The arguments were sufficiently persuasive to trigger a significant research effort devoted to the problem of program correctness. A proportion of this research was based on the ideal of certainty achieved by mathematical proof.
Fortunately, the problem of program correctness has turned out to be far less serious than predicted. A recent analysis by Mackenzie has shown that of several thousand deaths so far reliably attributed to dependence on computers, only ten or so can be explained by errors in the software: most of these were due to a couple of instances of incorrect dosage calculations in the treatment of cancer by radiation. Similarly predictions of collapse of software due to size have been falsified by continuous operation of real-time software systems now measured in tens of millions of lines of code, and subjected to thousands of updates per year…And aircraft, both civil and military, are now flying with the aid of software measured in millions of lines—though not all of it is safety-critical. Compilers and operating systems of a similar size now number their satisfied customers in millions. So the questions arise: why have twenty years of pessimistic predictions been falsified? Was it due to successful application of the results of the research which was motivated by the predictions? How could that be, when clearly little software has ever has been subjected to the rigours of formal proof?
Success in the use of mathematics for specification, design and code reviews does not require strict formalisation of all the proofs. Informal reasoning among those who are fluent in the idioms of mathematics is extremely efficient, and remarkably reliable. It is not immune from failure; for example simple misprints can be surprisingly hard to detect by eye. Fortunately, these are exactly the kind of error that can be removed by early tests. More formal calculation can be reserved for the most crucial issues, such as interrupts and recovery procedures, where bugs would be most dangerous, expensive, and most difficult to diagnose by tests…Many more tests should be designed than there will ever be time to conduct; they should be generated as directly as possible from the specification, preferably automatically by computer program. Random selection at the last minute will protect against the danger that under pressure of time the program will be adapted to pass the tests rather than meeting the rest of its specification. There is some evidence that early attention to a comprehensive and rigorous test strategy can improve reliability of a delivered product, even when at the last minute there was no time to conduct the tests before delivery!
The missing steps may be quite difficult to fully prove, though; Nathanson 2009:
There is a lovely but probably apocryphal anecdote about Norbert Weiner. Teaching a class at MIT, he wrote something on the blackboard and said it was ‘obvious.’ One student had the temerity to ask for a proof. Weiner started pacing back and forth, staring at what he had written on the board and saying nothing. Finally, he left the room, walked to his office, closed the door, and worked. After a long absence he returned to the classroom. ‘It is obvious’, he told the class, and continued his lecture.
What conditions count as full scrutiny by the math community may not be too clear; Nathanson 2009 trenchantly mocks math talks:
Social pressure often hides mistakes in proofs. In a seminar lecture, for example, when a mathematician is proving a theorem, it is technically possible to interrupt the speaker in order to ask for more explanation of the argument. Sometimes the details will be forthcoming. Other times the response will be that it’s “obvious” or “clear” or “follows easily from previous results.” Occasionally speakers respond to a question from the audience with a look that conveys the message that the questioner is an idiot. That’s why most mathematicians sit quietly through seminars, understanding very little after the introductory remarks, and applauding politely at the end of a mostly wasted hour.
“Mathematicians solve 45-year-old Kervaire invariant puzzle”, Erica Klarreich 2009↩︎
“Why Did Lagrange ‘Prove’ the Parallel Postulate?”, Grabiner 2009:
It is true that Lagrange never did publish it, so he must have realized there was something wrong. In another version of the story, told by Jean-Baptiste Biot, who claims to have been there (though the minutes do not list his name), everybody there could see that something was wrong, so Lagrange’s talk was followed by a moment of complete silence [2, p. 84]. Still, Lagrange kept the manuscript with his papers for posterity to read.
Why work on it at all?
The historical focus on the fifth postulate came because it felt more like the kind of thing that gets proved. It is not self-evident, it requires a diagram even to explain, so it might have seemed more as though it should be a theorem. In any case, there is a tradition of attempted proofs throughout the Greek and then Islamic and then eighteenth-century mathematical worlds. Lagrange follows many eighteenth-century mathematicians in seeing the lack of a proof of the fifth postulate as a serious defect in Euclid’s Elements. But Lagrange’s criticism of the postulate in his manuscript is unusual. He said that the assumptions of geometry should be demonstrable “just by the principle of contradiction”-the same way, he said, that we know the axiom that the whole is greater than the part [32, p. 30R]. The theory of parallels rests on something that is not self-evident, he believed, and he wanted to do something about this.
What was the strange and alien to the modern mind approach that Lagrange used?
Recall that Lagrange said in this manuscript that axioms should follow from the principle of contradiction. But, he added, besides the principle of contradiction, “There is another principle equally self-evident,” and that is Leibniz’s principle of sufficient reason. That is: nothing is true “unless there is a sufficient reason why it should be so and not otherwise” [42, p. 31; italics added]. This, said Lagrange, gives as solid a basis for mathematical proof as does the principle of contradiction [32, p. 30V]. But is it legitimate to use the principle of sufficient reason in mathematics? Lagrange said that we are justified in doing this, because it has already been done. For example, Archimedes used it to establish that equal weights at equal distances from the fulcrum of a lever balance. Lagrange added that we also use it to show that three equal forces acting on the same point along lines separated by a third of the circumference of a circle are in equilibrium [32, pp. 31R-31V]…The modern reader may object that Lagrange’s symmetry arguments are, like the uniqueness of parallels, equivalent to Euclid’s postulate. But the logical correctness, or lack thereof, of Lagrange’s proof is not the point. (In this manuscript, by the way, Lagrange went on to give an analogous proof—also by the principle of sufficient reason-that between two points there is just one straight line, because if there were a second straight line on one side of the first, we could then draw a third straight line on the other side, and so on [32, pp. 34R-34V]. Lagrange, then, clearly liked this sort of argument.)
…Why did philosophers conclude that space had to be infinite, homogeneous, and the same in all directions? Effectively, because of the principle of sufficient reason. For instance, Giordano Bruno in 1600 argued that the universe must be infinite because there is no reason to stop at any point; the existence of an infinity of worlds is no less reasonable than the existence of a finite number of them. Descartes used similar reasoning in his Principles of Philosophy: “We recognize that this world. . . has no limits in its extension. . . . Wherever we imagine such limits, we . . . imagine beyond them some indefinitely extended space” [28, p. 104]. Similar arguments were used by other seventeenth-century authors, including Newton. Descartes identified space and the extension of matter, so geometry was, for him, about real physical space. But geometric space, for Descartes, had to be Euclidean…Descartes, some 50 years before Newton published his first law of motion, was a co-discoverer of what we call linear inertia: that in the absence of external influences a moving body goes in a straight line at a constant speed. Descartes called this the first law of nature, and for him, this law follows from what we now recognize as the principle of sufficient reason. Descartes said, “Nor is there any reason to think that, if [a part of matter] moves. . . and is not impeded by anything, it should ever by itself cease to move with the same force” [30, p. 75]…Leibniz, by contrast, did not believe in absolute space. He not only said that spatial relations were just the relations between bodies, he used the principle of sufficient reason to show this. If there were absolute space, there would have to be a reason to explain why two objects would be related in one way if East is in one direction and West in the opposite direction, and related in another way if East and West were reversed [24, p. 147]. Surely, said Leibniz, the relation between two objects is just one thing! But Leibniz did use arguments about symmetry and sufficient reason-sufficient reason was his principle, after all. Thus, although Descartes and Leibniz did not believe in empty absolute space and Newton did, they all agreed that what I am calling the Euclidean properties of space are essential to physics.
…In his 1748 essay “Reflections on Space and Time”, Euler argued that space must be real; it cannot be just the relations between bodies as the Leibnizians claim . This is because of the principles of mechanics—that is, Newton’s first and second laws. These laws are beyond doubt, because of the “marvelous” agreement they have with the observed motions of bodies. The inertia of a single body, Euler said, cannot possibly depend on the behavior of other bodies. The conservation of uniform motion in the same direction makes sense, he said, only if measured with respect to immovable space, not to various other bodies. And space is not in our minds, said Euler; how can physics-real physics-depend on something in our minds?…in his Critique of Pure Reason of 1781, Kant placed space in the mind nonetheless. We order our perceptions in space, but space itself is in the mind, an intuition of the intellect. Nevertheless, Kant’s space turned out to be Euclidean too. Kant argued that we need the intuition of space to prove theorems in geometry. This is because it is in space that we make the constructions necessary to prove theorems. And what theorem did Kant use as an example? The sum of the angles of a triangle is equal to two right angles, a result whose proof requires the truth of the parallel postulate [26, “Of space,” p. 423]…Lagrange himself is supposed to have said that spherical trigonometry does not need Euclid’s parallel postulate [4, pp. 52–53]. But the surface of a sphere, in the eighteenth-century view, is not non-Euclidean; it exists in 3-dimensional Euclidean space [20, p. 71]. The example of the sphere helps us see that the eighteenth-century discussion of the parallel postulate’s relationship to the other postulates is not really about what is logically possible, but about what is true of real space.
The final step:
Johann Heinrich Lambert was one of the mathematicians who worked on the problem of Postulate 5. Lambert explicitly recognized that he had not been able to prove it, and considered that it might always have to remain a postulate. He even briefly suggested a possible geometry on a sphere with an imaginary radius. But Lambert also observed that the parallel postulate is related to the law of the lever [20, p. 75]. He said that a lever with weightless arms and with equal weights at equal distances is balanced by a force in the opposite direction at the center equal to the sum of the weights, and that all these forces are parallel. So either we are using the parallel postulate, or perhaps, Lambert thought, some day we could use this physical result to prove the parallel postulate…These men did not want to do mechanics, as, say, Newton had done. They wanted to show not only that the world was this way, but that it necessarily had to be. A modern philosophical critic, Helmut Pulte, has said that Lagrange’s attempt to “reduce” mechanics to analysis strikes us today as “a misplaced endeavour to mathematize. . . an empirical science, and thus to endow it with infallibility” [39, p. 220]. Lagrange would have responded, “Right! That’s just exactly what we are all doing.”
Much of CS theory would disappear. In my own research some of Ken’s and my “best” results would survive, but many would be destroyed. The Karp-Lipton Theorem is gone in this world. Ditto all “dichotomy” results between P and NP-complete, and for P = #P, Jin-Yi’s similar work. Many barrier results, such as oracle theorems and natural proofs, lose their main motivation, while much fine structure in hardness-versus-randomness tradeoffs would be blown up. The PCP Theorem and all the related work is gone. Modern cryptography could survive if the algorithm were galactic, but otherwise would be in trouble. I am currently teaching Complexity Theory at Tech using the textbook by Sanjeev Arora and Boaz Barak…Most of the 573 pages of Arora-Barak would be gone:
- Delete all of chapter 3 on NP.
- Delete all of chapter 5 on the polynomial hierarchy.
- Delete most of chapter 6 on circuits.
- Delete all of chapter 7 on probabilistic computation.
- Mark as dangerous chapter 9 on cryptography.
- Delete most of chapter 10 on quantum computation—who would care about Shor’s algorithm then?
- Delete all of chapter 11 on the PCP theorem.
I will stop here. Most of the initial part of the book is gone. The same for much of Homer-Selman, and basically all of the “Reducibility and Completeness” CRC chapter.