The Existential Risk of Math Errors

Mathematical mistake/error-rates limit our understanding of rare risks and ability to defend against them
philosophy, transhumanism, statistics, survey, insight-porn
2012-07-202019-08-18 finished certainty: likely importance: 10

How empir­i­cally cer­tain can we be in any use of math­e­mat­i­cal rea­son­ing to make empir­i­cal claims? In con­trast to errors in many other forms of knowl­edge such as med­i­cine or psy­chol­o­gy, which have enor­mous lit­er­a­tures clas­si­fy­ing and quan­ti­fy­ing error rates, rich meth­ods of meta-­analy­sis and pool­ing expert belief, and much one can say about the prob­a­bil­ity of any result being true, math­e­mat­i­cal error has been rarely exam­ined except as a pos­si­bil­ity and a moti­vat­ing rea­son for research into for­mal meth­ods. There is lit­tle known beyond anec­dotes about how often pub­lished proofs are wrong, in what ways they are wrong, the impact of such errors, how errors vary by sub­field, what meth­ods decrease (or increase) errors, and so on. Yet, math­e­mat­ics is surely not immune to error, and for all the rich­ness of the sub­ject, math­e­mati­cians can usu­ally agree at least infor­mally on what has turned out to be right or wrong1, or good by other cri­te­ria like fruit­ful­ness or beau­ty. Gaif­man 2004 claims that errors are com­mon but any such analy­sis would be uned­i­fy­ing:

An agent might even have beliefs that log­i­cally con­tra­dict each oth­er. Mersenne believed that 267-1 is a prime num­ber, which was proved false in 1903, cf. Bell (1951). [The fac­tor­iza­tion, dis­cov­ered by Cole, is: 193,707,721 × 761,838,257,287.]…Now, there is no short­age of deduc­tive errors and of false math­e­mat­i­cal beliefs. Mersen­ne’s is one of the most known in a rich his­tory of math­e­mat­i­cal errors, involv­ing very promi­nent fig­ures (cf. De Millo et al. 1979, 269–270). The explo­sion in the num­ber of math­e­mat­i­cal pub­li­ca­tions and research reports has been accom­pa­nied by a sim­i­lar explo­sion in erro­neous claims; on the whole, errors are noted by small groups of experts in the area, and many go unheed­ed. There is noth­ing philo­soph­i­cally inter­est­ing that can be said about such fail­ures.2

I dis­agree. Quan­ti­ta­tive approaches can­not cap­ture every­thing, but why should we believe math­e­mat­ics is, unlike so many other fields like med­i­cine, uniquely unquan­tifi­able and inef­fa­bly inscrutable? As a non-­math­e­mati­cian look­ing at math­e­mat­ics largely as a black box, I think such errors are quite inter­est­ing, for sev­eral rea­sons: given the exten­sive role of math­e­mat­ics through­out the sci­ences, errors have seri­ous poten­tial impact; but in col­lect­ing all the anec­dotes I have found, the impact seems skewed towards errors in qua­si­-­for­mal proofs but not the actual results; and this may tell us some­thing about what it is that math­e­mati­cians do sub­con­sciously when they “do math” or why con­jec­ture res­o­lu­tion times are expo­nen­tial­ly-dis­trib­uted or what the role of for­mal meth­ods ought to be or what we should think about prac­ti­cally impor­tant but unre­solved prob­lems like P=NP.

Untrustworthy proofs

“Beware of bugs in the above code; I have only proved it cor­rect, not tried it.”

Don­ald Knuth

“When you have elim­i­nated the impos­si­ble, what­ever remains is often more improb­a­ble than your hav­ing made a mis­take in one of your impos­si­bil­ity proofs.”

Steven Kaas

In some respects, there is noth­ing to be said; in other respects, there is much to be said. dis­cusses a basic issue with : any use­ful dis­cus­sion will be rig­or­ous, hope­fully with physics and math proofs; but proofs them­selves are empir­i­cally unre­li­able. Given that math­e­mat­i­cal proofs have long been claimed to be the most reli­able form of epis­te­mol­ogy humans know and the only way to guar­an­tee truth3, this sets a basic upper bound on how much con­fi­dence we can put on any belief, and given the lurk­ing exis­tence of sys­tem­atic bias­es, it may even be pos­si­ble for there to be too much evi­dence for a claim (). There are other rare risks, from men­tal dis­eases4 to hard­ware errors5 to how to deal with con­tra­dic­tions6, but we’ll look at math­e­mat­i­cal error.

Error distribution

“When I asked what it was, he said, ‘It is the prob­a­bil­ity that the test bomb will ignite the whole atmos­phere.’ I decided I would check it myself! The next day when he came for the answers I remarked to him, ‘The arith­metic was appar­ently cor­rect but I do not know about the for­mu­las for the cap­ture cross sec­tions for oxy­gen and nitro­gen—after all, there could be no exper­i­ments at the needed energy lev­els.’ He replied, like a physi­cist talk­ing to a math­e­mati­cian, that he wanted me to check the arith­metic not the physics, and left. I said to myself, ‘What have you done, Ham­ming, you are involved in risk­ing all of life that is known in the Uni­verse, and you do not know much of an essen­tial part?’ I was pac­ing up and down the cor­ri­dor when a friend asked me what was both­er­ing me. I told him. His reply was, ‘Never mind, Ham­ming, no one will ever blame you.’”


“…of the two major ther­monu­clear cal­cu­la­tions made that sum­mer at Berke­ley, they got one right and .”

, 2020

This upper bound on our cer­tainty may force us to dis­re­gard cer­tain rare risks because the effect of error on our esti­mates of exis­ten­tial risks is asym­met­ric: an error will usu­ally reduce the risk, not increase it. The errors are not dis­trib­uted in any kind of sym­met­ri­cal around a mean: an exis­ten­tial risk is, by def­i­n­i­tion, bump­ing up against the upper bound on pos­si­ble dam­age. If we were try­ing to esti­mate, say, aver­age human height, and errors were dis­trib­uted like a bell curve, then we could ignore them. But if we are cal­cu­lat­ing the risk of a super-as­teroid impact which will kill all of human­i­ty, an error which means the super-as­teroid will actu­ally kill human­ity twice over is irrel­e­vant because it’s the same thing (we can’t die twice); how­ev­er, the mir­ror error—the super-as­teroid actu­ally killing half of human­i­ty—­mat­ters a great deal!

XKCD #809 “Los Alamos”

How big is this upper bound? Math­e­mati­cians have often made errors in proofs. But it’s for ideas to be accepted for a long time and then reject­ed. But we can divide errors into 2 basic cases cor­re­spond­ing to :

  1. Mis­takes where the the­o­rem is still true, but the proof was incor­rect (type I)
  2. Mis­takes where the the­o­rem was false, and the proof was also nec­es­sar­ily incor­rect (type II)

Before some­one comes up with a final answer, a math­e­mati­cian may have many lev­els of intu­ition in for­mu­lat­ing & work­ing on the prob­lem, but we’ll con­sider the final end-prod­uct where the math­e­mati­cian feels sat­is­fied that he has solved it. Case 1 is per­haps the most com­mon case, with innu­mer­able exam­ples; this is some­times due to mis­takes in the proof that any­one would accept is a mis­take, but many of these cases are due to chang­ing stan­dards of proof. For exam­ple, when David Hilbert dis­cov­ered errors in Euclid’s proofs which no one noticed before, the the­o­rems were still true, and the gaps more due to Hilbert being a mod­ern math­e­mati­cian think­ing in terms of for­mal sys­tems (which of course Euclid did not think in). (David Hilbert him­self turns out to be a use­ful exam­ple of the other kind of error: his famous was accom­pa­nied by def­i­nite opin­ions on the out­come of each prob­lem and some­times tim­ings, sev­eral of which were wrong or ques­tion­able7.) Sim­i­lar­ly, early cal­cu­lus used ‘infin­i­tes­i­mals’ which were some­times treated as being 0 and some­times treated as an indef­i­nitely small non-zero num­ber; this was inco­her­ent and strictly speak­ing, prac­ti­cally all of the cal­cu­lus results were wrong because they relied on an inco­her­ent con­cep­t—but of course the results were some of the great­est math­e­mat­i­cal work ever con­ducted8 and when later math­e­mati­cians put cal­cu­lus on a more rig­or­ous foot­ing, they imme­di­ately re-derived those results (some­times with impor­tant qual­i­fi­ca­tion­s), and doubt­less as mod­ern math evolves other fields have some­times needed to go back and clean up the foun­da­tions and will in the future.9 Other cases are more straight­for­ward, with math­e­mati­cians pub­lish­ing mul­ti­ple proofs/patches10 or covertly cor­rect­ing papers11. Some­times they make it into text­books: real­ized that his proof for , which is still open, was wrong only after 2 read­ers saw it in his 1914 text­book The The­ory of Num­bers and ques­tioned it. Attempts to for­mal­ize results into exper­i­men­tal­ly-ver­i­fi­able results (in the case of physic­s-re­lated math) or machine-checked proofs, or at least some sort of soft­ware form, some­times turns up issues with12 accepted13 results14, although not always impor­tant (eg the cor­rec­tion in Romero & Rubio 2013). Poin­caré points out this math­e­mat­i­cal ver­sion of the in “Intu­ition and Logic in Math­e­mat­ics”:

Strange! If we read over the works of the ancients we are tempted to class them all among the intu­ition­al­ists. And yet nature is always the same; it is hardly prob­a­ble that it has begun in this cen­tury to cre­ate minds devoted to log­ic. If we could put our­selves into the flow of ideas which reigned in their time, we should rec­og­nize that many of the old geome­ters were in ten­dency ana­lysts. Euclid, for exam­ple, erected a sci­en­tific struc­ture wherein his con­tem­po­raries could find no fault. In this vast con­struc­tion, of which each piece how­ever is due to intu­ition, we may still today, with­out much effort, rec­og­nize the work of a logi­cian.

… What is the cause of this evo­lu­tion? It is not hard to find. Intu­ition can not give us rigour, nor even cer­tain­ty; this has been rec­og­nized more and more. Let us cite some exam­ples. We know there exist con­tin­u­ous func­tions lack­ing deriv­a­tives. Noth­ing is more shock­ing to intu­ition than this propo­si­tion which is imposed upon us by log­ic. Our fathers would not have failed to say: “It is evi­dent that every con­tin­u­ous func­tion has a deriv­a­tive, since every curve has a tan­gent.” How can intu­ition deceive us on this point?

… I shall take as sec­ond exam­ple on which rest so many the­o­rems of math­e­mat­i­cal physics; today we estab­lish it by rea­son­ings very rig­or­ous but very long; hereto­fore, on the con­trary, we were con­tent with a very sum­mary proof. A cer­tain inte­gral depend­ing on an arbi­trary func­tion can never van­ish. Hence it is con­cluded that it must have a min­i­mum. The flaw in this rea­son­ing strikes us imme­di­ate­ly, since we use the abstract term func­tion and are famil­iar with all the sin­gu­lar­i­ties func­tions can present when the word is under­stood in the most gen­eral sense. But it would not be the same had we used con­crete images, had we, for exam­ple, con­sid­ered this func­tion as an elec­tric poten­tial; it would have been thought legit­i­mate to affirm that elec­tro­sta­tic equi­lib­rium can be attained. Yet per­haps a phys­i­cal com­par­i­son would have awak­ened some vague dis­trust. But if care had been taken to trans­late the rea­son­ing into the lan­guage of geom­e­try, inter­me­di­ate between that of analy­sis and that of physics, doubt­less this dis­trust would not have been pro­duced, and per­haps one might thus, even today, still deceive many read­ers not fore­warned.

…A first ques­tion presents itself. Is this evo­lu­tion end­ed? Have we finally attained absolute rigour? At each stage of the evo­lu­tion our fathers also thought they had reached it. If they deceived them­selves, do we not like­wise cheat our­selves?

We believe that in our rea­son­ings we no longer appeal to intu­ition; the philoso­phers will tell us this is an illu­sion. Pure logic could never lead us to any­thing but tau­tolo­gies; it could cre­ate noth­ing new; not from it alone can any sci­ence issue. In one sense these philoso­phers are right; to make arith­metic, as to make geom­e­try, or to make any sci­ence, some­thing else than pure logic is nec­es­sary.

Isaac New­ton, inci­den­tal­ly, gave two proofs of the same solu­tion to a prob­lem in prob­a­bil­i­ty, one via enu­mer­a­tion and the other more abstract; the enu­mer­a­tion was cor­rect, but the other proof totally wrong and this was not noticed for a long time, lead­ing Stigler to remark:15

If New­ton fooled him­self, he evi­dently took with him a suc­ces­sion of read­ers more than 250 years lat­er. Yet even they should feel no embar­rass­ment. As once wrote, “Every­one makes errors in prob­a­bil­i­ties, at times, and big ones.” (Graves, 1889, page 459)

Type I > Type II?

“Lef­schetz was a purely intu­itive math­e­mati­cian. It was said of him that he had never given a com­pletely cor­rect proof, but had never made a wrong guess either.”

Gian-­Carlo Rota16

Case 2 is dis­turbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinc­tion.

The preva­lence of case 1 might lead us to be very pes­simistic; case 1, case 2, what’s the dif­fer­ence? We have demon­strated a large error rate in math­e­mat­ics (and physics is prob­a­bly even worse off). Except, errors do not seem to be evenly & ran­domly dis­trib­uted between case 1 and case 2. There seem to be far more case 1s than case 2s, as already men­tioned in the early cal­cu­lus exam­ple: far more than 50% of the early cal­cu­lus results were cor­rect when checked more rig­or­ous­ly. Richard Ham­ming (1998) attrib­utes to a com­ment that while edit­ing that “of the new results in the papers reviewed most are true but the cor­re­spond­ing proofs are per­haps half the time plain wrong”. (WP men­tions as well that “His first math­e­mat­ics pub­li­ca­tion was writ­ten…after he dis­cov­ered an incor­rect proof in another paper.”) gives us an exam­ple with Hilbert:

Once more let me begin with Hilbert. When the Ger­mans were plan­ning to pub­lish Hilbert’s col­lected papers and to present him with a set on the occa­sion of one of his later birth­days, they real­ized that they could not pub­lish the papers in their orig­i­nal ver­sions because they were full of errors, some of them quite seri­ous. There­upon they hired a young unem­ployed math­e­mati­cian, , to go over Hilbert’s papers and cor­rect all mis­takes. Olga labored for three years; it turned out that all mis­takes could be cor­rected with­out any major changes in the state­ment of the the­o­rems. There was one excep­tion, a paper Hilbert wrote in his old age, which could not be fixed; it was a pur­ported proof of the con­tin­uum hypoth­e­sis, you will find it in a vol­ume of the Math­e­ma­tis­che Annalen of the early thir­ties. At last, on Hilbert’s birth­day, a freshly printed set of Hilbert’s col­lected papers was pre­sented to the Geheim­rat. Hilbert leafed through them care­fully and did not notice any­thing.17

So only one of those papers was irrepara­ble, while all the oth­ers were cor­rect and fix­able? Rota him­self expe­ri­enced this:

Now let us shift to the other end of the spec­trum, and allow me to relate another per­sonal anec­dote. In the sum­mer of 1979, while attend­ing a phi­los­o­phy meet­ing in Pitts­burgh, I was struck with a case of detached reti­nas. Thanks to Joni’s prompt inter­ven­tion, I man­aged to be oper­ated on in the nick of time and my eye­sight was saved. On the morn­ing after the oper­a­tion, while I was lying on a hos­pi­tal bed with my eyes ban­daged, Joni dropped in to vis­it. Since I was to remain in that Pitts­burgh hos­pi­tal for at least a week, we decided to write a paper. Joni fished a man­u­script out of my suit­case, and I men­tioned to her that the text had a few mis­takes which she could help me fix. There fol­lowed twenty min­utes of silence while she went through the draft. “Why, it is all wrong!” she finally remarked in her youth­ful voice. She was right. Every state­ment in the man­u­script had some­thing wrong. Nev­er­the­less, after labor­ing for a while, she man­aged to cor­rect every mis­take, and the paper was even­tu­ally pub­lished.

There are two kinds of mis­takes. There are fatal mis­takes that destroy a the­o­ry; but there are also con­tin­gent ones, which are use­ful in test­ing the sta­bil­ity of a the­o­ry.

A math­e­mati­cian of my acquain­tance referred me to pg118 of The Axiom of Choice, Jech 1973; he had found the sus­tained effect of the 5 foot­notes humor­ous:

  1. The result of Prob­lem 11 con­tra­dicts the results announced by Levy [1963b]. Unfor­tu­nate­ly, the con­struc­tion pre­sented there can­not be com­plet­ed.
  2. The trans­fer to ZF was also claimed by Marek [1966] but the out­lined method appears to be unsat­is­fac­tory and has not been pub­lished.
  3. A con­tra­dict­ing result was announced and later with­drawn by Truss [1970].
  4. The exam­ple in Prob­lem 22 is a coun­terex­am­ple to another con­di­tion of Mostowski, who con­jec­tured its suf­fi­ciency and sin­gled out this exam­ple as a test case.
  5. The inde­pen­dence result con­tra­dicts the claim of Fel­gner [1969] that the Cofi­nal­ity Prin­ci­ple implies the Axiom of Choice. An error has been found by Mor­ris (see Fel­gn­er’s cor­rec­tions to [1969]).

And referred me also to the entries in the index of Fourier Analy­sis by Tom Körner con­cern­ing the prob­lem of the “point­wise con­ver­gence of Fourier series”:

Some prob­lems are noto­ri­ous for pro­vok­ing repeated false proofs. attracts count­less cranks and seri­ous attempts, of course, but also amus­ing is appar­ently the Jaco­bian Con­jec­ture:

The (in)­fa­mous Jaco­bian Con­jec­ture was con­sid­ered a the­o­rem since a 1939 pub­li­ca­tion by Keller (who claimed to prove it). Then Sha­fare­vich found a new proof and pub­lished it in some con­fer­ence pro­ceed­ings paper (in early 1950-ies). This con­jec­ture states that any poly­no­mial map from C^2 to C^2 is invert­ible if its Jaco­bian is nowhere zero. In 1960-ies, Vitushkin found a coun­terex­am­ple to all the proofs known to date, by con­struct­ing a com­plex ana­lytic map, not invert­ible and with nowhere van­ish­ing Jaco­bian. It is still a main source of embar­rass­ment for con­trib­u­tors, who pub­lish about 3–5 false proofs year­ly. Here is a funny refu­ta­tion for one of the proofs:

The prob­lem of Jaco­bian Con­jec­ture is very hard. Per­haps it will take human being another 100 years to solve it. Your attempt is noble, Maybe the Gods of Olym­pus will smile on you one day. Do not be too dis­ap­point­ed. B. Sagre has the honor of pub­lish­ing three wrong proofs and C. Cheval­ley mis­takes a wrong proof for a cor­rect one in the 1950’s in his Math Review com­ments, and I.R. Sha­fare­vich uses Jaco­bian Con­jec­ture (to him it is a the­o­rem) as a fact…

This look into the prover­bial sausage fac­tory should not come as a sur­prise to any­one tak­ing an Out­side View: why would­n’t we expect any area of intel­lec­tual endeav­our to have error rates within a few orders of mag­ni­tude as any other area? How absurd to think that the rate might be ~0%; but it’s also a lit­tle ques­tion­able to be as opti­mistic as Anders Sand­berg’s math­e­mati­cian friend: “he responded that he thought a far smaller num­ber [1%] of papers in math were this flawed.”


Other times, the cor­rect result is known and proven, but many are unaware of the answers19. The famous —those that have been solved, any­way—have a long his­tory of failed proofs (Fer­mat surely did not prove & may have real­ized this only after boast­ing20 and nei­ther did 21). What explains this? The guid­ing fac­tor that keeps pop­ping up when math­e­mati­cians make leaps seems to go under the name of ‘ele­gance’ or , which widely con­sid­ered impor­tant222324. This imbal­ance sug­gests that math­e­mati­cians are quite cor­rect when they say proofs are not the heart of math­e­mat­ics and that they pos­sess insight into math, a 6th sense for math­e­mat­i­cal truth, a nose for aes­thetic beauty which cor­re­lates with verac­i­ty: they dis­pro­por­tion­ately go after the­o­rems rather than their nega­tions.

Why this is so, I do not know.

Out­right Pla­ton­ism like Godel appar­ently believed in seems unlike­ly—­math­e­mat­i­cal exper­tise resem­bles a com­plex skill like chess-­play­ing more than it does a sen­sory modal­ity like vision. Pos­si­bly they have well-de­vel­oped heuris­tics and short­-­cuts and they focus on the sub­sets of results on which those heuris­tics work well (the drunk search­ing under the spot­light), or per­haps they do run full rig­or­ous proofs but are doing so sub­con­sciously and merely express them­selves ineptly con­sciously with omis­sions and erro­neous for­mu­la­tions ‘left as an exer­cise for the reader’25.

We could try to jus­tify the heuris­tic par­a­digm by appeal­ing to as-yet poorly under­stood aspects of the brain, like our visual cor­tex: argue that what is going on is that math­e­mati­cians are sub­con­sciously doing tremen­dous amounts of com­pu­ta­tion (like we do tremen­dous amounts of com­pu­ta­tion in a thought as ordi­nary as rec­og­niz­ing a face), which they are unable to bring up explic­it­ly. So after pro­longed intro­spec­tion and some com­par­a­tively sim­ple explicit sym­bol manip­u­la­tion or thought, they feel that a con­jec­ture is true and this is due to a sum­mary of said mas­sive com­pu­ta­tions.

Per­haps they are check­ing many instances? Per­haps they are and look­ing for bound­aries? Could there be some sort of “log­i­cal prob­a­bil­ity” where going down pos­si­ble proof-­paths yield prob­a­bilis­tic infor­ma­tion about the final tar­get the­o­rem, maybe in some sort of of proof-trees? Do serve to con­sol­i­date & prune & mem­o­ries of incom­plete lines of thought, fine­tun­ing heuris­tics or intu­itions for future attacks and get­ting deeper into a prob­lem (per­haps anal­o­gous to )? Read­ing great math­e­mati­cians like dis­cuss the heuris­tics they use on unsolved prob­lems26, they bear some resem­blances to com­puter sci­ence tech­niques. This would be con­sis­tent with a pre­lim­i­nary obser­va­tion about to solve math­e­mat­i­cal con­jec­tures: while infer­ence is ren­dered dif­fi­cult by the expo­nen­tial growth in the global pop­u­la­tion and of math­e­mati­cians, the dis­tri­b­u­tion of time-­to-­so­lu­tion roughly matches a mem­o­ry­less (one with a con­stant chance of solv­ing it in any time peri­od) rather than a more intu­itive dis­tri­b­u­tion like a type 1 (where a con­jec­ture gets eas­ier to solve over time, per­haps as related math­e­mat­i­cal knowl­edge accu­mu­lates), sug­gest­ing a model of math­e­mat­i­cal activ­ity in which many inde­pen­dent ran­dom attempts are made, each with a small chance of suc­cess, and even­tu­ally one suc­ceeds. This idea of exten­sive uncon­scious com­pu­ta­tion neatly accords with Poin­car­é’s account of math­e­mat­i­cal cre­ativ­ity in which after long fruit­less effort (prepa­ra­tion), he aban­doned the prob­lem for a time and engaged in ordi­nary activ­i­ties (), is sud­denly struck by an answer or insight, and then ver­i­fies its cor­rect­ness con­scious­ly. The exis­tence of an incu­ba­tion effect seems con­firmed by psy­cho­log­i­cal stud­ies and par­tic­u­lar the obser­va­tion that incu­ba­tion effects increase with the time allowed for incu­ba­tion & also if the sub­ject does not under­take demand­ing men­tal tasks dur­ing the incu­ba­tion period (see Sio & Ormerod 2009), and is con­sis­tent with exten­sive uncon­scious com­pu­ta­tion. Some of this com­pu­ta­tion may hap­pen dur­ing sleep; sleep & cog­ni­tion have long been asso­ci­ated in a murky fash­ion (“sleep on it”), but it may have to do with review­ing the events of the day & dif­fi­cult tasks, with rel­e­vant rein­forced or per­haps more think­ing going on. I’ve seen more than one sug­ges­tion of this, and math­e­mati­cian sug­gests this as well.27 (It’s unclear how many results occur this way; men­tions find­ing one result but never again28; J Thomas men­tions one suc­cess but one fail­ure by a teacher29; R. W. Thoma­son dreamed of a dead friend mak­ing a clearly false claim and pub­lished mate­r­ial based on his dis­proof of the ghost’s claim30; and report­edly had a use­ful dream & an early sur­vey of 69 math­e­mati­cians yielded 63 nulls, 5 low-qual­ity results, and 1 hit31.)

Heuris­tics, how­ev­er, do not gen­er­al­ize, and fail out­side their par­tic­u­lar domain. Are we for­tu­nate enough that the domain math­e­mati­cians work in is—de­lib­er­ately or acci­den­tal­ly—just that domain in which their heuristics/intuition suc­ceeds? Sand­berg sug­gests not:

Unfor­tu­nately I sus­pect that the con­nois­seur­ship of math­e­mati­cians for truth might be local to their domain. I have dis­cussed with friends about how “brit­tle” dif­fer­ent math­e­mat­i­cal domains are, and our con­sen­sus is that there are def­i­nitely dif­fer­ences between log­ic, geom­e­try and cal­cu­lus. Philoso­phers also seem to have a good nose for what works or does­n’t in their domain, but it does­n’t seem to carry over to other domains. Now mov­ing out­side to applied domains things get even trick­i­er. There does­n’t seem to be the same “nose for truth” in risk assess­ment, per­haps because it is an inter­dis­ci­pli­nary, messy domain. The cog­ni­tive abil­i­ties that help detect cor­rect deci­sions are likely local to par­tic­u­lar domains, trained through expe­ri­ence and maybe tal­ent (i.e. some con­for­mity between neural path­ways and deep prop­er­ties of the domain). The only thing that remains is gen­er­al-pur­pose intel­li­gence, and that has its own lim­i­ta­tions.

advo­cates for machine-checked proofs and a more rig­or­ous style of proofs sim­i­lar to , not­ing a math­e­mati­cian acquain­tance guesses at a broad error rate of 1⁄332 and that he rou­tinely found mis­takes in his own proofs and, worse, believed false con­jec­tures33.

We can prob­a­bly add soft­ware to that list: early soft­ware engi­neer­ing work found that, dis­may­ing­ly, bug rates seem to be sim­ply a func­tion of , and one would expect . So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS oper­at­ing sys­tem ker­nel to the ~50,000,000 lines of code in Win­dows Server 2003 (with full sys­tems of appli­ca­tions and libraries being even larg­er: the com­pre­hen­sive repos­i­tory in 2007 con­tained ~323,551,126 lines of code) that the num­ber of active bugs at any time would be… fairly large. Math­e­mat­i­cal soft­ware is hope­fully bet­ter, but prac­ti­tion­ers still run into issues (eg Durán et al 2014, Fon­seca et al 2017) and I don’t know of any research pin­ning down how buggy key math­e­mat­i­cal sys­tems like Math­e­mat­ica are or how much pub­lished math­e­mat­ics may be erro­neous due to bugs. This gen­eral prob­lem led to pre­dic­tions of doom and spurred much research into auto­mated proof-check­ing, sta­tic analy­sis, and func­tional lan­guages34.

The doom, how­ev­er, did not man­i­fest and arguably oper­at­ing sys­tems & appli­ca­tions are more reli­able in the 2000s+ than they were in the 1980–1990s35 (eg. the gen­eral dis­ap­pear­ance of the ). Users may not appre­ci­ate this point, but pro­gram­mers who hap­pen to think one day of just how the sausage of Gmail is made—how many inter­act­ing tech­nolo­gies and stacks of for­mats and pro­to­cols are involved—­may get the shakes and won­der how it could ever work, much less be work­ing at that moment. The answer is not really clear: it seems to be a com­bi­na­tion of abun­dant com­put­ing resources dri­ving down per-­line error rates by avoid­ing opti­miza­tion, mod­u­lar­iza­tion reduc­ing inter­ac­tions between lines, greater use of test­ing invok­ing an adver­sar­ial atti­tude to one’s code, and a light sprin­kling of for­mal meth­ods & sta­tic checks36.

While hope­ful, it’s not clear how many of these would apply to exis­ten­tial risks: how does one use ran­dom­ized test­ing on the­o­ries of exis­ten­tial risk, or trade­off code clar­ity for com­put­ing per­for­mance?

Type I vs Type II

So we might for­give case 1 errors entire­ly: if a com­mu­nity of math­e­mati­cians take an ‘incor­rect’ proof about a par­tic­u­lar exis­ten­tial risk and rat­ify it (ei­ther by ver­i­fy­ing the proof sub­con­sciously or see­ing what their heuris­tics say), it not being writ­ten out because it would be tedious too37, then we may be more con­fi­dent in it38 than lump­ing the two error rates togeth­er. Case 2 errors are the prob­lem, and they can some­times be sys­tem­at­ic. Most dra­mat­i­cal­ly, when an entire group of papers with all their results turn out to be wrong since they made a since-dis­proved assump­tion:

In the 1970s and 1980s, math­e­mati­cians dis­cov­ered that framed man­i­folds with Arf- equal to 1—od­dball man­i­folds not sur­gi­cally related to a sphere—do in fact exist in the first five dimen­sions on the list: 2, 6, 14, 30 and 62. A clear pat­tern seemed to be estab­lished, and many math­e­mati­cians felt con­fi­dent that this pat­tern would con­tinue in higher dimen­sion­s…Re­searchers devel­oped what Ravenel calls an entire “cos­mol­ogy” of con­jec­tures based on the assump­tion that man­i­folds with Arf-K­er­vaire invari­ant equal to 1 exist in all dimen­sions of the form . Many called the notion that these man­i­folds might not exist the “Dooms­day Hypoth­e­sis,” as it would wipe out a large body of research. Ear­lier this year, Vic­tor Snaith of the Uni­ver­sity of Sheffield in Eng­land pub­lished a book about this research, warn­ing in the pref­ace, “…this might turn out to be a book about things which do not exist.”

Just weeks after Snaith’s book appeared, Hop­kins announced on April 21 that Snaith’s worst fears were jus­ti­fied: that Hop­kins, Hill and Ravenel had proved that no man­i­folds of Arf-K­er­vaire invari­ant equal to 1 exist in dimen­sions 254 and high­er. Dimen­sion 126, the only one not cov­ered by their analy­sis, remains a mys­tery. The new find­ing is con­vinc­ing, even though it over­turns many math­e­mati­cians’ expec­ta­tions, Hovey said.39

The is another fas­ci­nat­ing exam­ple of math­e­mat­i­cal error of the sec­ond kind; its his­tory is replete with false proofs even by greats like (on what strike the mod­ern reader as bizarre grounds)40, self­-de­cep­tion, and mis­un­der­stand­ings— devel­oped a non-­Euclid­ean geom­e­try flaw­lessly but con­cluded it was flawed:

The sec­ond pos­si­bil­ity turned out to be harder to refute. In fact he was unable to derive a log­i­cal con­tra­dic­tion and instead derived many non-in­tu­itive results; for exam­ple that tri­an­gles have a max­i­mum finite area and that there is an absolute unit of length. He finally con­cluded that: “the hypoth­e­sis of the acute angle is absolutely false; because it is repug­nant to the nature of straight lines”. Today, his results are the­o­rems of .

We could look upon Type II errors as hav­ing a benev­o­lent aspect: they show both that our exist­ing meth­ods are too weak & infor­mal and that our intuition/heuristics break down at it—im­ply­ing that all pre­vi­ous math­e­mat­i­cal effort has been sys­tem­at­i­cally mis­led in avoid­ing that area (as emp­ty), and that there is much low-hang­ing fruit. (Con­sider how many scores or hun­dreds of key the­o­rems were proven by the very first math­e­mati­cians to work in the non-­Euclid­ean geome­tries!)

Future implications

Should such wide­ly-­be­lieved con­jec­tures as 41 or the turn out be false, then because they are assumed by so many exist­ing proofs, entire text­book chap­ters (and per­haps text­books) would dis­ap­pear—and our pre­vi­ous esti­mates of error rates will turn out to have been sub­stan­tial under­es­ti­mates. But it may be a cloud with a sil­ver lin­ing: it is not what you don’t know that’s dan­ger­ous, but what you know that ain’t so.

See Also


Jones 1998

“A credo of sorts”; Vaughan Jones (Truth in Math­e­mat­ics, 1998), pg208–209:

Proofs are indis­pens­able, but I would say they are nec­es­sary but not suf­fi­cient for math­e­mat­i­cal truth, at least truth as per­ceived by the indi­vid­ual.

To jus­tify this atti­tude let me invoke two expe­ri­ences of cur­rent math­e­mat­ics, which very few math­e­mati­cians today have escaped.

The first is com­puter pro­gram­ming. To write a short pro­gram, say 100 lines of C code, is a rel­a­tively pain­less expe­ri­ence. The debug­ging will take longer than the writ­ing, but it will not entail sui­ci­dal thoughts. How­ev­er, should an inex­pe­ri­enced pro­gram­mer under­take to write a slightly longer pro­gram, say 1000 lines, dis­tress­ing results will fol­low. The debug­ging process becomes an emo­tional night­mare in which one will doubt one’s own san­i­ty. One will cer­tainly insult the com­piler in words that are inap­pro­pri­ate for this essay. The math­e­mati­cian, hav­ing gone through this tor­ture, can­not but ask: “Have I ever sub­jected the proofs of any of my the­o­rems to such close scruti­ny?” In my case at least the answer is surely “no”. So while I do not doubt that my proofs are cor­rect (at least the sig­nif­i­cant ones), my belief in the results needs bol­ster­ing. Com­pare this with the debug­ging process. At the end of debug­ging we are happy with our pro­gram because of the con­sis­tency of the out­put it gives, not because we feel we have proved it cor­rec­t—after all we did that at least twenty times while debug­ging and we were wrong every time. Why not a twen­ty-­first? In fact we are acutely aware that our poor pro­gram has only been tested with a lim­ited set of inputs and we fully expect more bugs to man­i­fest them­selves when inputs are used which we have not yet con­sid­ered. If the pro­gram is suf­fi­ciently impor­tant, it will be fur­ther debugged in the course of time until it becomes secure with respect to all inputs. (With much larger pro­grams this will never hap­pen.) So it is with our the­o­rems. Although we may have proofs galore and a rich sur­round­ing struc­ture, if the result is at all dif­fi­cult it is only the test of time that will cause accep­tance of the “truth” of the result.

The sec­ond expe­ri­ence con­cern­ing the need for sup­ple­ments to proof is one which I used to dis­like intense­ly, but have come to appre­ci­ate and even search for. It is the sit­u­a­tion where one has two water­tight, well-de­signed argu­ments—that lead inex­orably to oppo­site con­clu­sions. Remem­ber that research in math­e­mat­ics involves a foray into the unknown. We may not know which of the two con­clu­sions is cor­rect or even have any feel­ing or guess. Proof at this point is our only arbiter. And it seems to have let us down. I have known myself to be in this sit­u­a­tion for months on end. It induces obses­sive and anti-­so­cial behav­iour. Per­haps we have found an incon­sis­tency in math­e­mat­ics. But no, even­tu­ally some crack is seen in one of the argu­ments and it begins to look more and more shaky. Even­tu­ally we kick our­selves for being so utterly stu­pid and life goes on. But it was no tool of logic that saved us. The search for a chink in the armour often involved many tricks includ­ing elab­o­rate thought exper­i­ments and per­haps com­puter cal­cu­la­tions. Much struc­tural under­stand­ing is cre­at­ed, which is why I now so value this process. One’s feel­ing of hav­ing obtained truth at the end is approach­ing the absolute. Though I should add that I have been forced to reverse the con­clu­sion on occa­sions…

  1. Exam­ples like the ABC con­jec­ture being the excep­tions that prove the rule.↩︎

  2. Cita­tions:

  3. As a prag­ma­tist & empiri­cist, I must have the temer­ity to dis­agree with the likes of Plato about the role of proof: if math­e­mat­i­cal proof truly was so reli­able, then I would have lit­tle to write about in this essay. How­ever rig­or­ous logic is, it is still cre­ated & used by fal­li­ble humans. There is no ‘Pla­to­nia’ we can tap into to obtain tran­scen­dent truth.↩︎

  4. There are var­i­ous delu­sions (eg. ), s, com­pul­sive lying (), dis­or­ders pro­vok­ing such as the gen­eral symp­tom of ; in a dra­matic exam­ple of how the mind is what the brain does, some anosog­nosia can be tem­porar­ily cured by squirt­ing cold water in an ear; from “The Apol­o­gist and the Rev­o­lu­tion­ary”:

    Take the exam­ple of the woman dis­cussed in Lish­man’s Organic Psy­chi­a­try. After a right-hemi­sphere stroke, she lost move­ment in her left arm but con­tin­u­ously denied it. When the doc­tor asked her to move her arm, and she observed it not mov­ing, she claimed that it was­n’t actu­ally her arm, it was her daugh­ter’s. Why was her daugh­ter’s arm attached to her shoul­der? The patient claimed her daugh­ter had been there in the bed with her all week. Why was her wed­ding ring on her daugh­ter’s hand? The patient said her daugh­ter had bor­rowed it. Where was the patien­t’s arm? The patient “turned her head and searched in a bemused way over her left shoul­der”…In any case, a patient who has been deny­ing paral­y­sis for weeks or months will, upon hav­ing cold water placed in the ear, admit to paral­y­sis, admit to hav­ing been par­a­lyzed the past few weeks or months, and express bewil­der­ment at hav­ing ever denied such an obvi­ous fact. And then the effect wears off, and the patient not only denies the paral­y­sis but denies ever hav­ing admit­ted to it.

  5. , Hales 2014:

    As an exam­ple, we will cal­cu­late the expected num­ber of soft errors in one of the math­e­mat­i­cal cal­cu­la­tions of Sec­tion 1.17. The Atlas Project cal­cu­la­tion of the E8 char­ac­ter table was a 77 hour cal­cu­la­tion that required 64 giga­bytes RAM [Ada07]. Soft errors rates are gen­er­ally mea­sured in units of fail­ures-in-­time (FIT). One FIT is defined as one error per 109 hours of oper­a­tion. If we assume a soft error rate of 103 FIT per Mbit, (which is a typ­i­cal rate for a mod­ern mem­ory device oper­at­ing at sea level 15 [Tez04]), then we would expect there to be about 40 soft errors in mem­ory dur­ing the cal­cu­la­tion:

    This exam­ple shows that soft errors can be a real­is­tic con­cern in math­e­mat­i­cal cal­cu­la­tions. (As added con­fir­ma­tion, the E8 cal­cu­la­tion has now been repeated about 5 times with iden­ti­cal result­s.)…The soft error rate is remark­ably sen­si­tive to ele­va­tion; a cal­cu­la­tion in Den­ver pro­duces about three times more soft errors than the same cal­cu­la­tion on iden­ti­cal hard­ware in Boston…­Soft errors are depress­ing news in the ultra­-re­li­able world of proof assis­tants. Alpha par­ti­cles rain on per­fect and imper­fect soft­ware alike. In fact, because the num­ber of soft errors is pro­por­tional to the exe­cu­tion time of a cal­cu­la­tion, by being slow and method­i­cal, the prob­a­bil­ity of a soft error dur­ing a cal­cu­la­tion inside a proof assis­tant can be much higher than the prob­a­bil­ity when done out­side.

  6. Most/all math results require their sys­tem to be con­sis­tent; but this is one par­tic­u­lar philo­soph­i­cal view. , in :

    If a con­tra­dic­tion were now actu­ally found in arith­metic—that would only prove that an arith­metic with such a con­tra­dic­tion in it could ren­der very good ser­vice; and it would be bet­ter for us to mod­ify our con­cept of the cer­tainty required, than to say it would really not yet have been a proper arith­metic.

    , , points out one way to react to such issues:

    A skep­ti­cal solu­tion of a philo­soph­i­cal prob­lem begins… by con­ced­ing that the skep­tic’s neg­a­tive asser­tions are unan­swer­able. Nev­er­the­less our ordi­nary prac­tice or belief is jus­ti­fied because-­con­trary appear­ances notwith­stand­ing-it need not require the jus­ti­fi­ca­tion the scep­tic has shown to be unten­able. And much of the value of the scep­ti­cal argu­ment con­sists pre­cisely in the fact that he has shown that an ordi­nary prac­tice, if it is to be defended at all, can­not be defended in a cer­tain way.

  7. Lip­ton lists sev­er­al:

    1. the tran­scen­dal­ity of and : resolved as pre­dict­ed, but >78 years faster than he pre­dict­ed.
    2. proof of the con­sis­tency of arith­metic: pre­dic­tion that arith­metic was con­sis­tent and this was prov­able was fal­si­fied (Goedel show­ing it is unprov­able)

    One could add to this Hilbert list: the (in­de­pen­den­t); and the algo­rithm for solv­ing Dio­phan­tines (im­pos­si­ble to give, to the sur­prise of who said review­ing one of the papers “Well, that’s not the way it’s gonna go.”). From Math­Over­flow:

    Hilbert’s 21st prob­lem, on the exis­tence of lin­ear DEs with pre­scribed mon­odromy group, was for a long time thought to have been solved by Plemelj in 1908. In fact, Plemelj died in 1967 still believ­ing he had solved the prob­lem. How­ev­er, in 1989, Boli­bruch dis­cov­ered a coun­terex­am­ple. Details are in the book The Rie­man­n-Hilbert Prob­lem by Anosov and Boli­bruch (Vieweg-­Teub­ner 1994), and a nice pop­u­lar recount­ing of the story is in Ben Yan­del­l’s The Hon­ors Class (A K Peters 2002).

    Lip­ton also pro­vides as exam­ples:

    • War­ren Hirsch’s poly­tope con­jec­ture

    • Sub­hash Khot’s con­jec­ture that his Unique Games prob­lem is NP-hard (not fal­si­fied but sub­stan­tially weak­ened)

    • the search for a proof of Euclid’s fifth pos­tu­late (cov­ered already)

    • George Pólya’s prime fac­tor­iza­tion con­jec­ture

    • Euler’s gen­er­al­iza­tion of Fer­mat’s last the­o­rem

    • Vir­ginia Rags­dale’s com­bi­na­to­r­ial con­jec­ture, related to a Hilbert prob­lem

    • Erik Zee­man’s knot-­ty­ing con­jec­ture; the res­o­lu­tion is too good to not quote:

      After try­ing to prove this for almost ten years, one day he worked on the oppo­site direc­tion, and solved it in hours.

    • a von Neu­mann topo­log­i­cal con­jec­ture

    • con­ven­tional wis­dom in com­plex­ity the­ory “that bound­ed-width pro­grams could not com­pute the major­ity func­tion, and many other func­tions”

    • dit­to, “Most believed that non­de­ter­min­is­tic log­space (NLOG) is not closed under com­ple­ment.”

    • Béla Julesz’s human vision sta­tis­tics con­jec­ture

  8. John von Neu­mann, “The Math­e­mati­cian” (1947):

    That Euclid’s axiom­a­ti­za­tion does at some minor points not meet the mod­ern require­ments of absolute axiomatic rigour is of lesser impor­tance in this respec­t…The first for­mu­la­tions of the cal­cu­lus were not even math­e­mat­i­cally rig­or­ous. An inex­act, semi­-­phys­i­cal for­mu­la­tion was the only one avail­able for over a hun­dred and fifty years after New­ton! And yet, some of the most impor­tant advances of analy­sis took place dur­ing this peri­od, against this inex­act, math­e­mat­i­cally inad­e­quate back­ground! Some of the lead­ing math­e­mat­i­cal spir­its of the period were clearly not rig­or­ous, like Euler; but oth­ers, in the main, were, like Gauss or Jaco­bi. The devel­op­ment was as con­fused and ambigu­ous as can be, and its rela­tion to empiri­cism was cer­tainly not accord­ing to our present (or Euclid­’s) ideas of abstrac­tion and rigour. Yet no math­e­mati­cian would want to exclude it from the fold—that period pro­duced math­e­mat­ics as first-­class as ever exist­ed! And even after the reign of rigour was essen­tially re-estab­lished with Cauchy, a very pecu­liar relapse into semi­-­phys­i­cal meth­ods took place with Rie­mann.

  9. Stephen Wol­fram men­tions a recent exam­ple I had­n’t run into used before, in a long dis­cus­sion of expand­ing Math­e­mat­ica to auto­mat­i­cally incor­po­rate old papers’ results

    Of course, there are all sorts of prac­ti­cal issues. Newer papers are pre­dom­i­nantly in TeX, so it’s not too dif­fi­cult to pull out the­o­rems with all their math­e­mat­i­cal nota­tion. But older papers need to be scanned, which requires math OCR, which has yet to be prop­erly devel­oped. Then there are issues like whether the­o­rems stated in papers are actu­ally valid. And even whether the­o­rems that were con­sid­ered valid, say, 100 years ago are still con­sid­ered valid today. For exam­ple, for con­tin­ued frac­tions, there are lots of pre-1950 the­o­rems that were suc­cess­fully proved in their time, but which ignore branch cuts, and so would­n’t be con­sid­ered cor­rect today. And in the end of course it requires lots of actu­al, skilled math­e­mati­cians to guide the cura­tion process, and to encode the­o­rems. But in a sense this kind of mobi­liza­tion of math­e­mati­cians is not com­pletely unfa­mil­iar; it’s some­thing like what was needed when Zen­tral­blatt was started in 1931, or Math­e­mat­i­cal Reviews in 1941.

  10. (), Melvyn B. Nathanson 2009:

    The his­tory of math­e­mat­ics is full of philo­soph­i­cally and eth­i­cally trou­bling reports about bad proofs of the­o­rems. For exam­ple, the states that every poly­no­mial of degree n with com­plex coef­fi­cients has exactly n com­plex roots. D’Alem­bert pub­lished a proof in 1746, and the the­o­rem became known “D’Alem­bert’s the­o­rem”, but the proof was wrong. Gauss pub­lished his first proof of the fun­da­men­tal the­o­rem in 1799, but this, too, had . Gauss’s sub­se­quent proofs, in 1816 and 1849, were OK. It seems to have been hard to deter­mine if a proof of the fun­da­men­tal the­o­rem of alge­bra was cor­rect. Why?

    was awarded a prize from King Oscar II of Swe­den and Nor­way for a paper on the , and his paper was pub­lished in Acta Math­e­mat­ica in 1890. But the pub­lished paper was not the prize-win­ning paper. The paper that won the prize con­tained seri­ous mis­takes, and Poin­care and other math­e­mati­cians, most impor­tant­ly, , engaged in a con­spir­acy to sup­press the truth and to replace the erro­neous paper with an exten­sively altered and cor­rected one.

    The three­-­body prob­lem is fas­ci­nat­ing as it gives us an exam­ple of a bad proof by Poin­caré & attempt to cover it up, but also an exam­ple of a impos­si­bil­ity proof: Bruns & Poin­caré proved in 1887 that the usual approaches could not work, typ­i­cally inter­preted as the 3 or n-body prob­lem being unsolv­able. Except in 1906/1909, pro­vided an (im­prac­ti­cal) algo­rithm using dif­fer­ent tech­niques to solve it. See “The Solu­tion of the n-body Prob­lem” & “A Visit to the New­ton­ian N-body Prob­lem via Ele­men­tary Com­plex Vari­ables”.↩︎

  11. , Hales 2014:

    Why use com­put­ers to ver­ify math­e­mat­ics? The sim­ple answer is that care­fully imple­mented proof check­ers make fewer errors than math­e­mati­cians (ex­cept J.-P. Ser­re). Incor­rect proofs of cor­rect state­ments are so abun­dant that they are impos­si­ble to cat­a­logue. Ralph Boas, for­mer exec­u­tive edi­tor of Math Reviews, once remarked that proofs are wrong “half the time” [Aus08]. Kem­pe’s claimed proof of the four-­color the­o­rem stood for more than a decade before Hea­wood refuted it [Mac01, p. 115]. “More than a thou­sand false proofs [of Fer­mat’s Last The­o­rem] were pub­lished between 1908 and 1912 alone” [Cor10]. Many pub­lished the­o­rems are like the hang­ing chad bal­lots of the 2000 U.S. pres­i­den­tial elec­tion, with scrawls too ambiva­lent for a clear yea or nay. One math­e­mati­cian even pro­posed to me that a new jour­nal is needed that unlike the oth­ers only pub­lishes reli­able results. Euclid gave us a method, but even he erred in the proof of the very first propo­si­tion of the Ele­ments when he assumed with­out proof that two cir­cles, each pass­ing through the oth­er’s cen­ter, must inter­sect. The con­cept that is needed to repair the gap in Euclid’s rea­son­ing is an inter­me­di­ate value the­o­rem. This defect was not reme­died until Hilbert’s Foun­da­tions of Geom­e­try. Exam­ples of widely accepted proofs of false or unprov­able state­ments show that our meth­ods of proof-check­ing are far from per­fect. Lagrange thought he had a proof of the par­al­lel pos­tu­late, but had enough doubt in his argu­ment to with­hold it from pub­li­ca­tion. In some cas­es, entire schools have become slop­py, such as the or real analy­sis before the rev­o­lu­tion in rigor towards the end of the nine­teenth cen­tu­ry. Plemelj’s 1908 accepted solu­tion to Hilbert’s 21st prob­lem on the mon­odromy of lin­ear dif­fer­en­tial equa­tions was refuted in 1989 by Boli­bruch. Aus­lan­der gives the exam­ple of a the­o­rem12 pub­lished by Warask­iewicz in 1937, gen­er­al­ized by Cho­quet in 1944, then refuted with a coun­terex­am­ple by Bing in 1948 [Aus08]. Another exam­ple is the approx­i­ma­tion prob­lem for Sobolev maps between two man­i­folds [Bet91], which con­tains a faulty proof of an incor­rect state­ment. The cor­rected the­o­rem appears in [HL03]. Such exam­ples are so plen­ti­ful that a Wiki page has been set up to clas­sify them, with ref­er­ences to longer dis­cus­sions at Math Over­flow [Wik11], [Ove09], [Ove10].

  12. , Simon Colton 2007:

    A more recent exam­ple was the dis­cov­ery that Andrew Wiles’ orig­i­nal proof of Fer­mat’s Last The­o­rem was flawed (but not, as it turned out, fatally flawed, as Wiles man­aged to fix the prob­lem (Singh, 1997))…­More recent­ly, Larry Wos has been using Otter to find smaller proofs of the­o­rems than the cur­rent ones. To this end, he uses Otter to find more suc­cinct meth­ods than those orig­i­nally pro­posed. This often results in detect­ing dou­ble nega­tions and remov­ing unnec­es­sary lem­mas, some of which were thought to be indis­pens­able. (Wos, 1996) presents a method­ol­ogy using a strat­egy known as res­o­nance to search for ele­gant proofs with Otter. He gives exam­ples from math­e­mat­ics and log­ic, and also argues that this work also impli­ca­tions for other fields such as cir­cuit design.

    (Fleu­riot & Paulson, 1998) have stud­ied the geo­met­ric proofs in New­ton’s Prin­cipia and inves­ti­gated ways to prove them auto­mat­i­cally with the Isabelle inter­ac­tive the­o­rem prover (Paulson, 1994). To do this, they for­mal­ized the Prin­cipia in both Euclid­ean geom­e­try and non-­s­tan­dard analy­sis. While work­ing through one of the key results (propo­si­tion 11 of book 1, the Kepler prob­lem) they dis­cov­ered an anom­aly in the rea­son­ing. New­ton was appeal­ing to a cross-­mul­ti­pli­ca­tion result which was­n’t true for infin­i­tes­i­mals or infi­nite num­bers. Isabelle could there­fore not prove the result, but Fleu­riot man­aged to derive an alter­na­tive proof of the the­o­rem that the sys­tem found accept­able.

  13. Colton 2007: “For exam­ple, Hea­wood dis­cov­ered a flaw in Kem­pe’s 1879 proof of the ,2 which had been accepted for 11 years.” It would ulti­mately be proved with a com­puter in 1976—­maybe.↩︎

  14. Hales 2014:

    The­o­rems that are cal­cu­la­tions or enu­mer­a­tions are espe­cially prone to error. Feyn­man laments, “I don’t notice in the morass of things that some­thing, a lit­tle limit or sign, goes wrong… . . I have math­e­mat­i­cally proven to myself so many things that aren’t true” [Fey00, p. 885]. Else­where, Feyn­man describes two teams of physi­cists who car­ried out a two-year cal­cu­la­tion of the elec­tron mag­netic moment and inde­pen­dently arrived at the same pre­dicted val­ue. When exper­i­ment dis­agreed with pre­dic­tion, the dis­crep­ancy was even­tu­ally traced to an arith­metic error made by the physi­cists, whose cal­cu­la­tions were not so inde­pen­dent as orig­i­nally believed [Fey85, p. 117]. Pon­trya­gin and Rokhlin erred in com­put­ing sta­ble homo­topy groups of spheres. Lit­tle’s tables of knots from 1885 con­tains dupli­cate entries that went unde­tected until 1974. In enu­mer­a­tive geom­e­try, in 1848, Steiner counted 7776 plane con­ics tan­gent to 5 gen­eral plane con­ics, when there are actu­ally only 3264. One of the most per­sis­tent blun­ders in the his­tory of math­e­mat­ics has been the mis­clas­si­fi­ca­tion (or mis­de­f­i­n­i­tion) of con­vex Archimedean poly­he­dra. Time and again, the pseudo rhom­bic cuboc­ta­he­dron has been over­looked or illog­i­cally excluded from the clas­si­fi­ca­tion (Fig­ure 21) [Grue11].

  15. Stigler is also kind in , empha­siz­ing that while many of the sta­tis­ti­cians involved in max­i­mum like­li­hood incor­rectly proved false claims, they were very pro­duc­tive mis­takes.↩︎

  16. “Fine Hall in its golden age: Remem­brances of Prince­ton in the early fifties”, Indis­crete Thoughts↩︎

  17. “Ten Lessons I wish I had been Taught”, Gian-­Carlo Rota 1996↩︎

  18. There are 2 20th cen­tury math­e­mati­cians, born too late to work with Fara­day, and the tele­graph inven­tor Samuel Morse who while over­lap­ping with Fara­day, has a Wikipedia entry men­tion­ing no work in math­e­mat­ics; I do not know which Morse may be meant.↩︎

  19. An exam­ple of this would be “An Endur­ing Error”, Branko Grün­baum:

    Math­e­mat­i­cal truths are immutable, but math­e­mati­cians do make errors, espe­cially when car­ry­ing out non-triv­ial enu­mer­a­tions. Some of the errors are “inno­cent”—plain mis­takes that get cor­rected as soon as an inde­pen­dent enu­mer­a­tion is car­ried out. For exam­ple, Daubleb­sky [14] in 1895 found that there are pre­cisely 228 types of con­fig­u­ra­tions (123), that is, col­lec­tions of 12 lines and 12 points, each inci­dent with three of the oth­ers. In fact, as found by Gropp [19] in 1990, the cor­rect num­ber is 229. Another exam­ple is pro­vided by the enu­mer­a­tion of the uni­form tilings of the 3-di­men­sional space by Andreini [1] in 1905; he claimed that there are pre­cisely 25 types. How­ev­er, as shown [20] in 1994, the cor­rect num­ber is 28. Andreini listed some tilings that should not have been includ­ed, and missed sev­eral oth­er­s—but again, these are sim­ple errors eas­ily cor­rect­ed…It is sur­pris­ing how errors of this type escape detec­tion for a long time, even though there is fre­quent men­tion of the results. One exam­ple is pro­vided by the enu­mer­a­tion of 4-di­men­sional sim­ple poly­topes with 8 facets, by Brück­ner [7] in 1909. He replaces this enu­mer­a­tion by that of 3-di­men­sional “dia­grams” that he inter­preted as Schlegel dia­grams of con­vex 4-poly­topes, and claimed that the enu­mer­a­tion of these objects is equiv­a­lent to that of the poly­topes. How­ev­er, aside from sev­eral “inno­cent” mis­takes in his enu­mer­a­tion, there is a fun­da­men­tal error: While to all 4-poly­topes cor­re­spond 3-di­men­sional dia­grams, there is no rea­son to assume that every dia­gram arises from a poly­tope. At the time of Brück­n­er’s paper, even the cor­re­spond­ing fact about 3-poly­he­dra and 2-di­men­sional dia­grams has not yet been estab­lished—this fol­lowed only from Steinitz’s char­ac­ter­i­za­tion of com­plexes that deter­mine con­vex poly­he­dra [45], [46]. In fact, in the case con­sid­ered by Brück­n­er, the assump­tion is not only unjus­ti­fied, but actu­ally wrong: One of Brück­n­er’s poly­topes does not exist, see [25].

    …Poly­he­dra have been stud­ied since antiq­ui­ty. It is, there­fore, rather sur­pris­ing that even con­cern­ing some of the poly­he­dra known since that time there is a lot of con­fu­sion, regard­ing both ter­mi­nol­ogy and essence. But even more unex­pected is the fact that many expo­si­tions of this topic com­mit seri­ous math­e­mat­i­cal and log­i­cal errors. More­over, this hap­pened not once or twice, but many times over the cen­turies, and con­tin­ues to this day in many printed and elec­tronic pub­li­ca­tions; the most recent case is in the sec­ond issue for 2008 of this jour­nal…With our under­stand­ings and exclu­sions, there are four­teen con­vex poly­he­dra that sat­isfy the local cri­te­rion and should be called “Archimedean”, but only thir­teen that sat­isfy the global cri­te­rion and are appro­pri­ately called “uni­form” (or “semi­reg­u­lar”). Rep­re­sen­ta­tives of the thir­teen uni­form con­vex poly­he­dra are shown in the sources men­tioned above, while the four­teenth poly­he­dron is illus­trated in Fig­ure 1. It sat­is­fies the local cri­te­rion but not the global one, and there­fore is—in our ter­mi­nol­o­gy—Archimedean but not uni­form. The his­tory of the real­iza­tion that the local cri­te­rion leads to four­teen poly­he­dra will be dis­cussed in the next sec­tion; it is remark­able that this devel­op­ment occurred only in the 20th cen­tu­ry. This implies that prior to the twen­ti­eth cen­tury all enu­mer­a­tions of the poly­he­dra sat­is­fy­ing the local cri­te­rion were mis­tak­en. Unfor­tu­nate­ly, many later enu­mer­a­tions make the same error.

  20. Dana Mack­inzie, The Uni­verse in Zero Words: The Story of Math­e­mat­ics as Told through Equa­tions (as quoted by John D. Cook):

    Fer­mat repeat­edly bragged about the n = 3 and n = 4 cases and posed them as chal­lenges to other math­e­mati­cians … But he never men­tioned the gen­eral case, n = 5 and high­er, in any of his let­ters. Why such restraint? Most like­ly, argues, because Fer­mat had real­ized that his “truly won­der­ful proof” did not work in those cas­es…Ev­ery math­e­mati­cian has had days like this. You think you have a great insight, but then you go out for a walk, or you come back to the prob­lem the next day, and you real­ize that your great idea has a flaw. Some­times you can go back and fix it. And some­times you can’t.

  21. From , “Fer­mat’s Last The­o­rem”:

    Much addi­tional progress was made over the next 150 years, but no com­pletely gen­eral result had been obtained. Buoyed by false con­fi­dence after his proof that pi is tran­scen­den­tal, the math­e­mati­cian Lin­de­mann pro­ceeded to pub­lish sev­eral proofs of Fer­mat’s Last The­o­rem, all of them invalid (Bell 1937, pp. 464–465). A prize of 100000 Ger­man marks, known as the Wolfskehl Prize, was also offered for the first valid proof (Ball and Cox­eter 1987, p. 72; Barner 1997; Hoff­man 1998, pp. 193–194 and 199).

    A recent false alarm for a gen­eral proof was raised by Y. Miyaoka (Cipra 1988) whose proof, how­ev­er, turned out to be flawed. Other attempted proofs among both pro­fes­sional and ama­teur math­e­mati­cians are dis­cussed by vos Savant (1993), although vos Savant erro­neously claims that work on the prob­lem by Wiles (dis­cussed below) is invalid.

  22. To take a ran­dom exam­ple (which could be mul­ti­plied indef­i­nite­ly); from Gödel and the Nature of Math­e­mat­i­cal Truth: A Talk with Rebecca Gold­stein (6.8.2005):

    Ein­stein told the philoso­pher of sci­ence that he’d known even before the solar eclipse of 1918 sup­ported his gen­eral the­ory of rel­a­tiv­ity that the the­ory must be true because it was so beau­ti­ful. And , who worked on both rel­a­tiv­ity the­ory and quan­tum mechan­ics, said “My work always tried to unite the true with the beau­ti­ful, but when I had to choose one or the oth­er, I usu­ally chose the beau­ti­ful.”…Math­e­mat­ics seems to be the one place where you don’t have to choose, where truth and beauty are always unit­ed. One of my all-­time favorite books is . tries to demon­strate to a gen­eral audi­ence that math­e­mat­ics is inti­mately about beau­ty. He gives as exam­ples two proofs, one show­ing that the square root of 2 is irra­tional, the other show­ing that there’s no largest prime num­ber. Sim­ple, eas­ily gras­pable proofs, that stir the soul with won­der.

  23. Nathanson 2009 claims the oppo­site:

    Many math­e­mati­cians have the oppo­site opin­ion; they do not or can­not dis­tin­guish the beauty or impor­tance of a the­o­rem from its proof. A the­o­rem that is first pub­lished with a long and dif­fi­cult proof is highly regard­ed. Some­one who, prefer­ably many years lat­er, finds a short proof is “bril­liant.” But if the short proof had been obtained in the begin­ning, the the­o­rem might have been dis­par­aged as an “easy result.” Erdős was a genius at find­ing bril­liantly sim­ple proofs of deep results, but, until recent­ly, much of his work was ignored by the math­e­mat­i­cal estab­lish­ment.

  24. From “Aes­thet­ics as a Lib­er­at­ing Force in Math­e­mat­ics Edu­ca­tion?”, by Nathalie Sin­clair (reprinted in The Best Writ­ing on Math­e­mat­ics 2010, ed. Mircea Piti­ci); pg208:

    There is a long tra­di­tion in math­e­mat­ics of describ­ing proofs and the­o­rems in aes­thetic terms, often using words such as ‘ele­gance’ and ‘depth’. Fur­ther, math­e­mati­cians have also argued that their sub­ject is more akin to an art than it is to a sci­ence (see ; Lit­tle­wood, 1986; Sul­li­van 1925/1956), and, like the arts, ascribe to math­e­mat­ics aes­thetic goals. For exam­ple, the math­e­mati­cian W. Krull (1930/1987) writes: “the pri­mary goals of the math­e­mati­cian are aes­thet­ic, and not epis­te­mo­log­i­cal” (p. 49). This state­ment seems con­tra­dic­tory with the oft-cited con­cern of math­e­mat­ics with find­ing or dis­cov­er­ing truths, but it empha­sises the fact that the math­e­mati­cian’s inter­est is in express­ing truth, and in doing so in clev­er, sim­ple, suc­cinct ways.

    While Krull focuses on math­e­mat­i­cal expres­sion, the math­e­mati­cian H. Poin­care (1908/1966) con­cerns him­self with the psy­chol­ogy of math­e­mat­i­cal inven­tion, but he too under­lines the aes­thetic dimen­sion of math­e­mat­ics, not the log­i­cal. In Poin­car­e’s the­o­ry, a large part of a math­e­mati­cian’s work is done at the sub­con­scious lev­el, where an aes­thetic sen­si­bil­ity is respon­si­ble for alert­ing the math­e­mati­cians to the most fruit­ful and inter­est­ing of ideas. Other math­e­mati­cians have spo­ken of this spe­cial sen­si­bil­ity as well and also in terms of the way it guides math­e­mati­cians to choose cer­tain prob­lems. This choice is essen­tial in math­e­matic given that there exists no exter­nal real­ity against which math­e­mati­cians can decide which prob­lems or which branches of math­e­mat­ics are impor­tant (see von Neu­mann, 1947 [“The Math­e­mati­cian”]): the choice involves human val­ues and pref­er­ence—and, indeed, these change over time, as exem­pli­fied by the dis­missal of geom­e­try by some promi­nent math­e­mati­cians in the early 20th cen­tury (see White­ley, 1999).

    • Lit­tle­wood, 1986: “The math­e­mati­cian’s art of work”; in B. Bol­lobas (ed.), Lit­tle­wood’s mis­cel­lany, Cam­bridge Uni­ver­sity press
    • Sul­li­van 1925/1956: “Math­e­mat­ics as an art”; in J. New­man (ed.), The world of math­e­mat­ics, vol 3 (p 2015–2021)
  25. From pg 211–212, Sin­clair 2009:

    The sur­vey of math­e­mati­cians con­ducted by Wells (1990) pro­vides a more empir­i­cal­ly-based chal­lenge to the intrin­sic view of the math­e­mat­i­cal aes­thet­ic. Wells obtained responses from over 80 math­e­mati­cians, who were asked to iden­tify the most beau­ti­ful the­o­rem from a given set of 24 the­o­rems. (These the­o­rems were cho­sen because they were ‘famous’, in the sense that Wells judged them to be well-­known by most math­e­mati­cians, and of inter­est to the dis­ci­pline in gen­er­al, rather than to a par­tic­u­lar sub­field.) Wells finds that the math­e­mati­cians var­ied widely in their judg­ments. More inter­est­ing­ly, in explain­ing their choic­es, the math­e­mati­cians revealed a wide range of per­sonal responses affect­ing their aes­thetic responses to the the­o­rems. Wells effec­tively puts to rest the belief that math­e­mati­cians have some kind of secret agree­ment on what counts as beau­ti­ful in math­e­mat­ic­s…Bur­ton’s (2004) work focuses on the prac­tices of math­e­mati­cians and their under­stand­ing of those prac­tices. Based on exten­sive inter­views with a wide range of math­e­mati­cian­s…She points out that math­e­mati­cians range on a con­tin­uum from unim­por­tant to cru­cial in terms of their posi­tion­ings on the role of the aes­thet­ic, with only 3 of the 43 math­e­mati­cians dis­miss­ing its impor­tance. For exam­ple, one said “Beauty does­n’t mat­ter. I have never seen a beau­ti­ful math­e­mat­i­cal paper in my life” (p. 65). Another math­e­mati­cian was ini­tially dis­mis­sive about math­e­mat­i­cal beauty but lat­er, when speak­ing about the review process, said: “If it was a very ele­gant way of doing things, I would be inclined to for­give a lot of faults” (p. 65).

  26. Tao left a lengthy com­ment on a pre­vi­ously linked Lip­ton post:

    It is pos­si­ble to gather rea­son­ably con­vinc­ing sup­port for a con­jec­ture by a vari­ety of means, long before it is actu­ally proven, though many math­e­mati­cians are reluc­tant to argue too strongly based on such sup­port due to the lack of rigour or the risk of embar­rass­ment in hind­sight. Exam­ples of sup­port include:

    • Numer­i­cal evi­dence; but one has to be care­ful in sit­u­a­tions where the null hypoth­e­sis would also give com­pa­ra­ble numer­i­cal evi­dence. The first ten tril­lion zeroes of zeta on the crit­i­cal line is, in my opin­ion, only mild evi­dence in favour of RH (the null hypoth­e­sis may be, for instance, that the zeroes go hay­wire once log log t becomes suf­fi­ciently large); but the numer­i­cal data on spac­ings of zeroes is quite con­vinc­ing evi­dence for the GUE hypoth­e­sis, in my view. (It is a pri­ori con­ceiv­able that the spac­ings are dis­trib­uted accord­ing to GUE plus another cor­rec­tion that dom­i­nates when log log t (say) is large, but this begins to strain Occam’s razor.)
    • Non-triv­ial spe­cial cas­es. But it depends on how “rep­re­sen­ta­tive” one believes the spe­cial cases to be. For instance, if one can ver­ify low-di­men­sional cases of a con­jec­ture that is true in high dimen­sions, this is usu­ally only weak (but not entirely insignif­i­cant) evi­dence, as it is pos­si­ble that there exist high­-di­men­sional patholo­gies that sink the con­jec­ture but can­not be folded up into a low-di­men­sional sit­u­a­tion. But if one can do all odd­-di­men­sional cas­es, and all even-di­men­sional cases up to dimen­sion 8 (say), then that begins to look more con­vinc­ing.
    • Proofs of par­al­lel, anal­o­gous, or sim­i­lar con­jec­tures. Par­tic­u­larly if these proofs were non-triv­ial and led to new insights and tech­niques. RH in func­tion fields is a good exam­ple here; it raises the hope of some sort of grand uni­fied approach to GRH that some­how han­dles all num­ber fields (or some other gen­eral class) simul­ta­ne­ous­ly.
    • Con­verse of the con­jec­ture is prov­able, and looks “opti­mal” some­how. One might be able to con­struct a list of all obvi­ous exam­ples of objects with prop­erty X, find sig­nif­i­cant dif­fi­culty extend­ing the list, and then con­jec­ture that this is list is com­plete. This is a com­mon way to make con­jec­tures, but can be dan­ger­ous, as one may sim­ply have a lack of imag­i­na­tion. So this is thin evi­dence by itself (many false con­jec­tures have arisen from this con­verse-­tak­ing method), but it does carry a lit­tle bit of weight once com­bined with other strands of evi­dence.
    • Con­jec­ture is ambi­tious and pow­er­ful, and yet is not imme­di­ately sunk by the obvi­ous con­sis­tency checks. This is vaguely anal­o­gous to the con­cept of a “fal­si­fi­able the­ory” in sci­ence. A strong con­jec­ture could have many pow­er­ful con­se­quences in a vari­ety of dis­parate areas of math­e­mat­ic­s—so pow­er­ful, in fact, that one would not be sur­prised that they could be dis­proven with var­i­ous coun­terex­am­ples. But sur­pris­ing­ly, when one checks the cases that one does under­stand quite well, the con­jec­ture holds up. A typ­i­cal exam­ple here might include a very gen­eral con­jec­tured iden­tity which, when spe­cialised to var­i­ous well-un­der­stood spe­cial cas­es, become a prov­able iden­ti­ty—but with the iden­tity in each spe­cial case being prov­able by very dif­fer­ent meth­ods, and the con­nec­tion between all the iden­ti­ties being mys­te­ri­ous other than via the con­jec­ture. The gen­eral con­jec­ture that the primes behave pseudo­ran­domly after account­ing for small mod­uli is an exam­ple of such a con­jec­ture; we usu­ally can’t con­trol how the primes behave, but when we can, the pseudo­ran­domess heuris­tic lines up per­fect­ly.
    • Attempts at dis­proof run into inter­est­ing obsta­cles. This one is a bit hard to for­malise, but some­times you can get a sense that attempts to dis­prove a con­jec­ture are fail­ing not due to one’s own lack of abil­i­ty, or due to acci­den­tal con­tin­gen­cies, but rather due to “enemy activ­ity”; some lurk­ing hid­den struc­ture to the prob­lem, cor­ners of which emerge every time one tries to build a coun­terex­am­ple. The ques­tion is then whether this “enemy” is stu­pid enough to be out­wit­ted by a suf­fi­ciently clever coun­terex­am­ple, or is pow­er­ful enough to block all such attempts. Iden­ti­fy­ing this enemy pre­cisely is usu­ally the key to resolv­ing the con­jec­ture (or trans­form­ing the con­jec­ture into a stronger and bet­ter con­jec­ture).
    • Con­jec­ture gen­er­alises to a broader con­jec­ture that enjoys sup­port of the types listed above. The twin prime con­jec­ture, by itself, is dif­fi­cult to sup­port on its own; but when it comes with an asymp­totic that one can then ver­ify numer­i­cally to high accu­racy and is a con­se­quence of the much more pow­er­ful prime tuples con­jec­ture (and more gen­er­al­ly, the pseudo­ran­dom­ness heuris­tic for the primes) which is sup­ported both because of its high fal­si­fi­a­bil­ity and also its nature as an opti­mal-look­ing con­verse (the only struc­ture to the primes are the “obvi­ous” struc­tures), it becomes much more con­vinc­ing. Another text­book exam­ple is the Poin­care con­jec­ture, which became much more con­vinc­ing after being inter­preted as a spe­cial case of geometri­sa­tion (which had a lot of sup­port, e.g. the two-di­men­sional ana­logue, Haken man­i­folds, lots of fal­si­fi­able pre­dic­tions, etc.).

    It can be fun (though a lit­tle risky, rep­u­ta­tion-­wise) to debate how strong var­i­ous pieces of evi­dence really are, but one soon reaches a point of dimin­ish­ing returns, as often we are lim­ited by our own igno­rance, lack of imag­i­na­tion, or cog­ni­tive bias­es. But we are at least rea­son­ably able to per­form rel­a­tive com­par­isons of the strength of evi­dence of two con­jec­tures in the same topic (I guess com­plex­ity the­ory is full of instances of this…).

  27. pg190–191 of Fas­ci­nat­ing Math­e­mat­i­cal Peo­ple, edited by Albers 2011:

    Guy: If I do any math­e­mat­ics at all I think I do it in my sleep.

    MP: Do you think a lot of math­e­mati­cians work that way?

    Guy: I do. Yes. The human brain is a remark­able thing, and we are a long way from under­stand­ing how it works. For most math­e­mat­i­cal prob­lems, imme­di­ate thought and pen­cil and paper—the usual things one asso­ciates with solv­ing math­e­mat­i­cal prob­lem­s—are just totally inad­e­quate. You need to under­stand the prob­lem, make a few sym­bols on paper and look at them. Most of us, as opposed to Erdős who would prob­a­bly give an answer to a prob­lem almost imme­di­ate­ly, would then prob­a­bly have to go off to bed, and, if we’re lucky, when we wake up in the morn­ing, we would already have some insight into the prob­lem. On those rare occa­sions when I have such insight, I quite often don’t know that I have it, but when I come to work on the prob­lem again, to put pen­cil to paper, some­how the ideas just seem to click togeth­er, and the thing goes through. It is clear to me that my brain must have gone on, in an almost com­bi­na­to­r­ial way, check­ing the cases or doing an enor­mous num­ber of fairly triv­ial arith­meti­cal com­pu­ta­tions. It seems to know the way to go. I first noticed this with , which are indeed finite com­bi­na­to­r­ial prob­lems. The first indi­ca­tion that I was inter­ested in com­bi­na­toric­s—I did­n’t know I had the inter­est, and I did­n’t even know there was such a sub­ject as com­bi­na­toric­s—was that I used to com­pose chess endgames. I would sit up late into the night try­ing to ana­lyze a posi­tion. Even­tu­ally I would sink into slum­ber and wake up in the morn­ing to real­ize that if I had only moved the pawns over one file the whole thing would have gone through clear­ly. My brain must have been check­ing over this finite but mod­er­ately large num­ber of pos­si­bil­i­ties dur­ing the night. I think a lot of math­e­mati­cians must work that way.

    MP: Have you talked to any other math­e­mati­cians about that?

    Guy: No. But in Jacques Hadamard’s book on inven­tion in the math­e­mat­i­cal field, he quotes some exam­ples there where it is fairly clear that peo­ple do that kind of thing. There was some­one ear­lier this week who was talk­ing about Jean-­Paul Serre. He said that if you ask Serre a ques­tion he either gives you the answer imme­di­ate­ly, or, if he hes­i­tates, and you push him in any way, he will say, “How can I think about the ques­tion when I don’t know the answer?” I thought that was a lovely remark. At a much lower lev­el, one should think, “What shape should the answer be?” Then your mind can start check­ing whether you’re right and how to find some log­i­cal sequence to get you where you want to go.

  28. Jan­u­ary 14, 1974, in “Con­ver­sa­tions with Gian-­Carlo Rota”; as quoted on pg262 of Tur­ing’s Cathe­dral (2012) by :

    Once in my life I had a math­e­mat­i­cal dream which proved cor­rect. I was twenty years old. I thought, my God, this is won­der­ful, I won’t have to work, it will all come in dreams! But it never hap­pened again.

  29. J Thomas:

    Once after I had spent sev­eral days try­ing to prove a topol­ogy the­o­rem, I dreamed about it and woke up with as coun­terex­am­ple. In the dream it just con­structed itself, and I could see it. I did­n’t have a fever then, though. Later one of my teach­ers, an old Pol­ish wom­an, explained her expe­ri­ence. She kept a note­book by her bed so she could write down any insights she got in her sleep. She woke up in the night with a won­der­ful proof, and wrote it down, and in the morn­ing when she looked at it it was all garbage. “You can­not do math in your sleep. You will have to work.”

  30. “Higher alge­braic K-the­ory of schemes and of derived cat­e­gories”, Thoma­son & Trobaugh 1990:

    The first author must state that his coau­thor and close friend, Tom Trobaugh, quite intel­li­gent, sin­gu­larly orig­i­nal, and inor­di­nately gen­er­ous, killed him­self con­se­quent to endoge­nous depres­sion. 94 days lat­er, in my dream, Tom’s sim­u­lacrum remarked, “The direct limit char­ac­ter­i­za­tion of per­fect com­plexes shows that they extend, just as one extends a coher­ent sheaf.” Awak­ing with a start, I knew this idea had to be wrong, since some per­fect com­plexes have a non-­van­ish­ing K0 obstruc­tion to exten­sion. I had worked on this prob­lem for 3 years, and saw this approach to be hope­less. But Tom’s sim­u­lacrum had been so insis­tent, I knew he would­n’t let me sleep undis­turbed until I had worked out the argu­ment and could point to the gap. This work quickly led to the key results of this paper. To Tom, I could have explained why he must be listed as a coau­thor.

  31. , (1945), pg27

    Let us come to math­e­mati­cians. One of them, Mail­let, started a first inquiry as to their meth­ods of work. One famous ques­tion, in par­tic­u­lar, was already raised by him that of the “math­e­mat­i­cal dream”, it hav­ing been sug­gested often that the solu­tion of prob­lems that have defied inves­ti­ga­tion may appear in dreams. Though not assert­ing the absolute non-ex­is­tence of “math­e­mat­i­cal dreams”, Mail­let’s inquiry shows that they can­not be con­sid­ered as hav­ing a seri­ous sig­nif­i­cance. Only one remark­able obser­va­tion is reported by the promi­nent Amer­i­can math­e­mati­cian, Leonard Eugene Dick­son, who can pos­i­tively assert its accu­ra­cy…Ex­cept for that very curi­ous case, most of the 69 cor­re­spon­dents who answered Mail­let on that ques­tion never expe­ri­enced any math­e­mat­i­cal dream (I never did) or, in that line, dreamed of wholly absurd things, or were unable to state pre­cisely the ques­tion they hap­pened to dream of. 5 dreamed of quite naive argu­ments. There is one more pos­i­tive answer; but it is dif­fi­cult to take account of it, as its author remains anony­mous.

  32. From his 1993 “How to Write a Proof”:

    Anec­do­tal evi­dence sug­gests that as many as a third of all papers pub­lished in math­e­mat­i­cal jour­nals con­tain mis­takes—not just minor errors, but incor­rect the­o­rems and proof­s…My infor­ma­tion about math­e­mati­cians’ errors and embar­rass­ment comes mainly from George Bergman.

  33. 1993 “How to Write a Proof”:

    Some twenty years ago, I decided to write a proof of the Schroed­er-Bern­stein the­o­rem for an intro­duc­tory math­e­mat­ics class. The sim­plest proof I could find was in Kel­ley’s clas­sic gen­eral topol­ogy text [4, page 28]. Since Kel­ley was writ­ing for a more sophis­ti­cated audi­ence, I had to add a great deal of expla­na­tion to his half-­page proof. I had writ­ten five pages when I real­ized that Kel­ley’s proof was wrong. Recent­ly, I wanted to illus­trate a lec­ture on my proof style with a con­vinc­ing incor­rect proof, so I turned to Kel­ley. I could find noth­ing wrong with his proof; it seemed obvi­ously cor­rect! Read­ing and reread­ing the proof con­vinced me that either my mem­ory had failed, or else I was very stu­pid twenty years ago. Still, Kel­ley’s proof was short and would serve as a nice exam­ple, so I started rewrit­ing it as a struc­tured proof. Within min­utes, I redis­cov­ered the error.

    My inter­est in proofs stems from writ­ing cor­rect­ness proofs of algo­rithms. These proofs are sel­dom deep, but usu­ally have con­sid­er­able detail. Struc­tured proofs pro­vided a way of cop­ing with this detail. The style was first applied to proofs of ordi­nary the­o­rems in a paper I wrote with Mar­tin Abadi [2]. He had already writ­ten con­ven­tional proof­s|proofs that were good enough to con­vince us and, pre­sum­ably, the ref­er­ees. Rewrit­ing the proofs in a struc­tured style, we dis­cov­ered that almost every one had seri­ous mis­takes, though the the­o­rems were cor­rect. Any hope that incor­rect proofs might not lead to incor­rect the­o­rems was destroyed in our next col­lab­o­ra­tion [1]. Time and again, we would make a con­jec­ture and write a proof sketch on the black­board—a sketch that could eas­ily have been turned into a con­vinc­ing con­ven­tional proof—only to dis­cov­er, by try­ing to write a struc­tured proof, that the con­jec­ture was false. Since then, I have never believed a result with­out a care­ful, struc­tured proof. My skep­ti­cism has helped avoid numer­ous errors.

    , Lam­port 2011:

    My ear­lier paper on struc­tured proofs described how effec­tive they are at catch­ing errors. It recounted how only by writ­ing such a proof was I able to re-dis­cover an error in a proof of the Schroed­er-Bern­stein the­o­rem in a well-­known topol­ogy text [2, page 28]. I recently received email from a math­e­mati­cian say­ing that he had tried unsuc­cess­fully to find that error by writ­ing a struc­tured proof. I asked him to send me his proof, and he respond­ed:

    I tried typ­ing up the proof that I’d hand-writ­ten, and in the process, I think I’ve found the fun­da­men­tal error. . . I now really begin to under­stand what you mean about the power of this method, even if it did take me hours to get to this point!

    It is instruc­tive that, to find the error, he had to re-write his proof to be read by some­one else. Elim­i­nat­ing errors requires care.

  34. , 1996:

    Twenty years ago it was rea­son­able to pre­dict that the size and ambi­tion of soft­ware prod­ucts would be severely lim­ited by the unre­li­a­bil­ity of their com­po­nent pro­grams. Crude esti­mates sug­gest that pro­fes­sion­ally writ­ten pro­grams deliv­ered to the cus­tomer can con­tain between one and ten inde­pen­dently cor­rectable errors per thou­sand lines of code; and any soft­ware error in prin­ci­ple can have spec­tac­u­lar effect (or worse: a sub­tly mis­lead­ing effect) on the behav­iour of the entire sys­tem. Dire warn­ings have been issued..The argu­ments were suf­fi­ciently per­sua­sive to trig­ger a sig­nif­i­cant research effort devoted to the prob­lem of pro­gram cor­rect­ness. A pro­por­tion of this research was based on the ideal of cer­tainty achieved by math­e­mat­i­cal proof.

  35. Hoare 1996:

    For­tu­nate­ly, the prob­lem of pro­gram cor­rect­ness has turned out to be far less seri­ous than pre­dict­ed. A recent analy­sis by Macken­zie has shown that of sev­eral thou­sand deaths so far reli­ably attrib­uted to depen­dence on com­put­ers, only ten or so can be explained by errors in the soft­ware: most of these were due to a cou­ple of instances of incor­rect dosage cal­cu­la­tions in the treat­ment of can­cer by radi­a­tion. Sim­i­larly pre­dic­tions of col­lapse of soft­ware due to size have been fal­si­fied by con­tin­u­ous oper­a­tion of real-­time soft­ware sys­tems now mea­sured in tens of mil­lions of lines of code, and sub­jected to thou­sands of updates per year…And air­craft, both civil and mil­i­tary, are now fly­ing with the aid of soft­ware mea­sured in mil­lions of lines—though not all of it is safe­ty-­crit­i­cal. Com­pil­ers and oper­at­ing sys­tems of a sim­i­lar size now num­ber their sat­is­fied cus­tomers in mil­lions. So the ques­tions arise: why have twenty years of pes­simistic pre­dic­tions been fal­si­fied? Was it due to suc­cess­ful appli­ca­tion of the results of the research which was moti­vated by the pre­dic­tions? How could that be, when clearly lit­tle soft­ware has ever has been sub­jected to the rigours of for­mal proof?

  36. Hoare 1996:

    Suc­cess in the use of math­e­mat­ics for spec­i­fi­ca­tion, design and code reviews does not require strict for­mal­i­sa­tion of all the proofs. Infor­mal rea­son­ing among those who are flu­ent in the idioms of math­e­mat­ics is extremely effi­cient, and remark­ably reli­able. It is not immune from fail­ure; for exam­ple sim­ple mis­prints can be sur­pris­ingly hard to detect by eye. For­tu­nate­ly, these are exactly the kind of error that can be removed by early tests. More for­mal cal­cu­la­tion can be reserved for the most cru­cial issues, such as inter­rupts and recov­ery pro­ce­dures, where bugs would be most dan­ger­ous, expen­sive, and most dif­fi­cult to diag­nose by test­s…­Many more tests should be designed than there will ever be time to con­duct; they should be gen­er­ated as directly as pos­si­ble from the spec­i­fi­ca­tion, prefer­ably auto­mat­i­cally by com­puter pro­gram. Ran­dom selec­tion at the last minute will pro­tect against the dan­ger that under pres­sure of time the pro­gram will be adapted to pass the tests rather than meet­ing the rest of its spec­i­fi­ca­tion. There is some evi­dence that early atten­tion to a com­pre­hen­sive and rig­or­ous test strat­egy can improve reli­a­bil­ity of a deliv­ered pro­duct, even when at the last minute there was no time to con­duct the tests before deliv­ery!

  37. The miss­ing steps may be quite dif­fi­cult to fully prove, though; Nathanson 2009:

    There is a lovely but prob­a­bly apoc­ryphal anec­dote about . Teach­ing a class at MIT, he wrote some­thing on the black­board and said it was ‘obvi­ous.’ One stu­dent had the temer­ity to ask for a proof. Weiner started pac­ing back and forth, star­ing at what he had writ­ten on the board and say­ing noth­ing. Final­ly, he left the room, walked to his office, closed the door, and worked. After a long absence he returned to the class­room. ‘It is obvi­ous’, he told the class, and con­tin­ued his lec­ture.

  38. What con­di­tions count as full scrutiny by the math com­mu­nity may not be too clear; Nathanson 2009 tren­chantly mocks math talks:

    Social pres­sure often hides mis­takes in proofs. In a sem­i­nar lec­ture, for exam­ple, when a math­e­mati­cian is prov­ing a the­o­rem, it is tech­ni­cally pos­si­ble to inter­rupt the speaker in order to ask for more expla­na­tion of the argu­ment. Some­times the details will be forth­com­ing. Other times the response will be that it’s “obvi­ous” or “clear” or “fol­lows eas­ily from pre­vi­ous results.” Occa­sion­ally speak­ers respond to a ques­tion from the audi­ence with a look that con­veys the mes­sage that the ques­tioner is an idiot. That’s why most math­e­mati­cians sit qui­etly through sem­i­nars, under­stand­ing very lit­tle after the intro­duc­tory remarks, and applaud­ing politely at the end of a mostly wasted hour.

  39. “Math­e­mati­cians solve 45-year-old Ker­vaire invari­ant puz­zle”, Erica Klar­re­ich 2009↩︎

  40. “Why Did Lagrange ‘Prove’ the Par­al­lel Pos­tu­late?”, Gra­biner 2009:

    It is true that Lagrange never did pub­lish it, so he must have real­ized there was some­thing wrong. In another ver­sion of the sto­ry, told by , who claims to have been there (though the min­utes do not list his name), every­body there could see that some­thing was wrong, so Lagrange’s talk was fol­lowed by a moment of com­plete silence [2, p. 84]. Still, Lagrange kept the man­u­script with his papers for pos­ter­ity to read.

    Why work on it at all?

    The his­tor­i­cal focus on the fifth pos­tu­late came because it felt more like the kind of thing that gets proved. It is not self­-ev­i­dent, it requires a dia­gram even to explain, so it might have seemed more as though it should be a the­o­rem. In any case, there is a tra­di­tion of attempted proofs through­out the Greek and then Islamic and then eigh­teen­th-­cen­tury math­e­mat­i­cal worlds. Lagrange fol­lows many eigh­teen­th-­cen­tury math­e­mati­cians in see­ing the lack of a proof of the fifth pos­tu­late as a seri­ous defect in . But Lagrange’s crit­i­cism of the pos­tu­late in his man­u­script is unusu­al. He said that the assump­tions of geom­e­try should be demon­stra­ble “just by the ”-the same way, he said, that we know the axiom that the whole is greater than the part [32, p. 30R]. The the­ory of par­al­lels rests on some­thing that is not self­-ev­i­dent, he believed, and he wanted to do some­thing about this.

    What was the strange and alien to the mod­ern mind approach that Lagrange used?

    Recall that Lagrange said in this man­u­script that axioms should fol­low from the prin­ci­ple of con­tra­dic­tion. But, he added, besides the prin­ci­ple of con­tra­dic­tion, “There is another prin­ci­ple equally self­-ev­i­dent,” and that is Leib­niz’s . That is: noth­ing is true “unless there is a suf­fi­cient rea­son why it should be so and not oth­er­wise” [42, p. 31; ital­ics added]. This, said Lagrange, gives as solid a basis for math­e­mat­i­cal proof as does the prin­ci­ple of con­tra­dic­tion [32, p. 30V]. But is it legit­i­mate to use the prin­ci­ple of suf­fi­cient rea­son in math­e­mat­ics? Lagrange said that we are jus­ti­fied in doing this, because it has already been done. For exam­ple, Archimedes to estab­lish that equal weights at equal dis­tances from the ful­crum of a lever bal­ance. Lagrange added that we also use it to show that three equal forces act­ing on the same point along lines sep­a­rated by a third of the cir­cum­fer­ence of a cir­cle are in equi­lib­rium [32, pp. 31R-31V]…The mod­ern reader may object that Lagrange’s sym­me­try argu­ments are, like the unique­ness of par­al­lels, equiv­a­lent to Euclid’s pos­tu­late. But the log­i­cal cor­rect­ness, or lack there­of, of Lagrange’s proof is not the point. (In this man­u­script, by the way, Lagrange went on to give an anal­o­gous proof—also by the prin­ci­ple of suf­fi­cient rea­son-that between two points there is just one straight line, because if there were a sec­ond straight line on one side of the first, we could then draw a third straight line on the other side, and so on [32, pp. 34R-34V]. Lagrange, then, clearly liked this sort of argu­men­t.)

    …Why did philoso­phers con­clude that space had to be infinite, homo­ge­neous, and the same in all direc­tions? Effec­tive­ly, because of the prin­ci­ple of suf­fi­cient rea­son. For instance, in 1600 argued that the uni­verse must be infi­nite because there is no rea­son to stop at any point; the exis­tence of an infin­ity of worlds is no less rea­son­able than the exis­tence of a finite num­ber of them. Descartes used sim­i­lar rea­son­ing in his Prin­ci­ples of Phi­los­o­phy: “We rec­og­nize that this world. . . has no lim­its in its exten­sion. . . . Wher­ever we imag­ine such lim­its, we . . . imag­ine beyond them some indef­i­nitely extended space” [28, p. 104]. Sim­i­lar argu­ments were used by other sev­en­teen­th-­cen­tury authors, includ­ing New­ton. Descartes iden­ti­fied space and the exten­sion of mat­ter, so geom­e­try was, for him, about real phys­i­cal space. But geo­met­ric space, for Descartes, had to be Euclid­ean…Descartes, some 50 years before New­ton pub­lished his first law of motion, was a co-dis­cov­erer of what we call lin­ear iner­tia: that in the absence of exter­nal influ­ences a mov­ing body goes in a straight line at a con­stant speed. Descartes called this the first law of nature, and for him, this law fol­lows from what we now rec­og­nize as the prin­ci­ple of suf­fi­cient rea­son. Descartes said, “Nor is there any rea­son to think that, if [a part of mat­ter] moves. . . and is not impeded by any­thing, it should ever by itself cease to move with the same force” [30, p. 75]…Leib­niz, by con­trast, did not believe in absolute space. He not only said that spa­tial rela­tions were just the rela­tions between bod­ies, he used the prin­ci­ple of suf­fi­cient rea­son to show this. If there were absolute space, there would have to be a rea­son to explain why two objects would be related in one way if East is in one direc­tion and West in the oppo­site direc­tion, and related in another way if East and West were reversed [24, p. 147]. Sure­ly, said Leib­niz, the rela­tion between two objects is just one thing! But Leib­niz did use argu­ments about sym­me­try and suf­fi­cient rea­son-­suf­fi­cient rea­son was his prin­ci­ple, after all. Thus, although Descartes and Leib­niz did not believe in empty absolute space and New­ton did, they all agreed that what I am call­ing the Euclid­ean prop­er­ties of space are essen­tial to physics.

    …In his 1748 essay “Reflec­tions on Space and Time”, Euler argued that space must be real; it can­not be just the rela­tions between bod­ies as the Leib­nizians claim [10]. This is because of the prin­ci­ples of mechan­ic­s—that is, New­ton’s first and sec­ond laws. These laws are beyond doubt, because of the “mar­velous” agree­ment they have with the observed motions of bod­ies. The iner­tia of a sin­gle body, Euler said, can­not pos­si­bly depend on the behav­ior of other bod­ies. The con­ser­va­tion of uni­form motion in the same direc­tion makes sense, he said, only if mea­sured with respect to immov­able space, not to var­i­ous other bod­ies. And space is not in our minds, said Euler; how can physic­s-real physic­s-de­pend on some­thing in our mind­s?…in his of 1781, Kant placed space in the mind nonethe­less. We order our per­cep­tions in space, but space itself is in the mind, an intu­ition of the intel­lect. Nev­er­the­less, Kan­t’s space turned out to be Euclid­ean too. Kant argued that we need the intu­ition of space to prove the­o­rems in geom­e­try. This is because it is in space that we make the con­struc­tions nec­es­sary to prove the­o­rems. And what the­o­rem did Kant use as an exam­ple? The sum of the angles of a tri­an­gle is equal to two right angles, a result whose proof requires the truth of the par­al­lel pos­tu­late [26, “Of space,” p. 423]…La­grange him­self is sup­posed to have said that spher­i­cal trigonom­e­try does not need Euclid’s par­al­lel pos­tu­late [4, pp. 52–53]. But the sur­face of a sphere, in the eigh­teen­th-­cen­tury view, is not non-­Euclid­ean; it exists in 3-di­men­sional Euclid­ean space [20, p. 71]. The exam­ple of the sphere helps us see that the eigh­teen­th-­cen­tury dis­cus­sion of the par­al­lel pos­tu­late’s rela­tion­ship to the other pos­tu­lates is not really about what is log­i­cally pos­si­ble, but about what is true of real space.

    The final step:

    was one of the math­e­mati­cians who worked on the prob­lem of Pos­tu­late 5. Lam­bert explic­itly rec­og­nized that he had not been able to prove it, and con­sid­ered that it might always have to remain a pos­tu­late. He even briefly sug­gested a pos­si­ble geom­e­try on a sphere with an imag­i­nary radius. But Lam­bert also observed that the par­al­lel pos­tu­late is related to the law of the lever [20, p. 75]. He said that a lever with weight­less arms and with equal weights at equal dis­tances is bal­anced by a force in the oppo­site direc­tion at the cen­ter equal to the sum of the weights, and that all these forces are par­al­lel. So either we are using the par­al­lel pos­tu­late, or per­haps, Lam­bert thought, some day we could use this phys­i­cal result to prove the par­al­lel pos­tu­late…These men did not want to do mechan­ics, as, say, New­ton had done. They wanted to show not only that the world was this way, but that it nec­es­sar­ily had to be. A mod­ern philo­soph­i­cal crit­ic, Hel­mut Pul­te, has said that Lagrange’s attempt to “reduce” mechan­ics to analy­sis strikes us today as “a mis­placed endeav­our to math­e­ma­tize. . . an empir­i­cal sci­ence, and thus to endow it with infal­li­bil­ity” [39, p. 220]. Lagrange would have respond­ed, “Right! That’s just exactly what we are all doing.”

  41. Sup­pos­ing P=NP:

    Much of CS the­ory would dis­ap­pear. In my own research some of Ken’s and my “best” results would sur­vive, but many would be destroyed. The Karp-Lip­ton The­o­rem is gone in this world. Ditto all “dichotomy” results between P and NP-­com­plete, and for P = #P, Jin-Y­i’s sim­i­lar work. Many bar­rier results, such as ora­cle the­o­rems and nat­ural proofs, lose their main moti­va­tion, while much fine struc­ture in hard­ness-ver­sus-ran­dom­ness trade­offs would be blown up. The PCP The­o­rem and all the related work is gone. Mod­ern cryp­tog­ra­phy could sur­vive if the algo­rithm were galac­tic, but oth­er­wise would be in trou­ble. I am cur­rently teach­ing Com­plex­ity The­ory at Tech using the text­book by San­jeev Arora and Boaz Barak…­Most of the 573 pages of Aro­ra-Barak would be gone:

    • Delete all of chap­ter 3 on NP.
    • Delete all of chap­ter 5 on the poly­no­mial hier­ar­chy.
    • Delete most of chap­ter 6 on cir­cuits.
    • Delete all of chap­ter 7 on prob­a­bilis­tic com­pu­ta­tion.
    • Mark as dan­ger­ous chap­ter 9 on cryp­tog­ra­phy.
    • Delete most of chap­ter 10 on quan­tum com­pu­ta­tion—who would care about Shor’s algo­rithm then?
    • Delete all of chap­ter 11 on the PCP the­o­rem.

    I will stop here. Most of the ini­tial part of the book is gone. The same for much of Home­r-Sel­man, and basi­cally all of the “Reducibil­ity and Com­plete­ness” CRC chap­ter.