Skip to main content

math directory


“HTPS: HyperTree Proof Search for Neural Theorem Proving”, Lample et al 2022

“HTPS: HyperTree Proof Search for Neural Theorem Proving”⁠, Guillaume Lample, Marie-Anne Lachaux, Thibaut Lavril, Xavier Martinet, Amaury Hayat, Gabriel Ebner, Aurélien Rodriguez et al (2022-05-23; ):

We propose an online training procedure for a transformer-based automated theorem prover.

Our approach leverages a new search algorithm, HyperTree Proof Search (HTPS), inspired by the recent success of AlphaZero⁠. Our model learns from previous proof searches through online training, allowing it to generalize to domains far from the training distribution.

We report detailed ablations of our pipeline’s main components by studying performance on three environments of increasing complexity.

In particular, we show that with HTPS alone, a model trained on annotated proofs manages to prove 65.4% of a held-out set of Metamath theorems, substantially outperforming the previous state of the art of 56.5% by GPT-f. Online training on these unproved theorems increases accuracy to 82.6%. With a similar computational budget, we improve the state of the art on the Lean-based miniF2F-curriculum dataset from 31% to 42% proving accuracy.

“End-to-end Symbolic Regression With Transformers”, Kamienny et al 2022

“End-to-end symbolic regression with transformers”⁠, Pierre-Alexandre Kamienny, Stéphane d’Ascoli, Guillaume Lample, François Charton (2022-04-22; ):

Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the “skeleton” of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function. The dominant approach is genetic programming, which evolves candidates by iterating this subroutine a large number of times. Neural networks have recently been tasked to predict the correct skeleton in a single try, but remain much less powerful. In this paper, we challenge this two-step procedure, and task a Transformer to directly predict the full mathematical expression, constants included. One can subsequently refine the predicted constants by feeding them to the non-convex optimizer as an informed initialization. We present ablations to show that this end-to-end approach yields better results, sometimes even without the refinement step. We evaluate our model on problems from the SRBench benchmark and show that our model approaches the performance of state-of-the-art genetic programming with several orders of magnitude faster inference.

“Deep Symbolic Regression for Recurrent Sequences”, d’Ascoli et al 2022

“Deep Symbolic Regression for Recurrent Sequences”⁠, Stéphane d’Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton (2022-01-12; ; similar):

Symbolic regression⁠, i.e. predicting a function from the observation of its values, is well-known to be a challenging task.

In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature.

We evaluate our integer model on a subset of OEIS sequences, and show that it outperforms built-in Wolfram Mathematica functions for recurrence prediction. We also demonstrate that our float model is able to yield informative approximations of out-of-vocabulary functions and constants, eg. bessel0(x) ≈ sin(x)+cos(x) / √πx and 1.644934 ≈ π2⁄6.

An interactive demonstration of our models is provided at⁠.

“A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”, Drori et al 2021

“A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”⁠, Iddo Drori, Sunny Tran, Roman Wang, Newman Cheng, Kevin Liu, Leonard Tang, Elizabeth Ke, Nikhil Singh et al (2021-12-31; ⁠, ; similar):

We demonstrate that a neural network pre-trained on text and fine-tuned on code solves Mathematics problems by program synthesis. We turn questions into programming tasks, automatically generate programs, and then execute them, perfectly solving university-level problems from MIT’s large Mathematics courses (Single Variable Calculus 18.01, Multivariable Calculus 18.02, Differential Equations 18.03, Introduction to Probability & Statistics 18.05, Linear Algebra 18.06, and Mathematics for Computer Science 6.042) as well as questions from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Number Theory, and Precalculus), the latest benchmark of advanced mathematics problems specifically designed to assess mathematical reasoning. We explore prompt generation methods that enable Transformers to generate question solving programs for these subjects, including solutions with plots. We generate correct answers for a random sample of questions in each topic. We quantify the gap between the original and transformed questions and perform a survey to evaluate the quality and difficulty of generated questions. This is the first work to automatically solve, grade, and generate university-level Mathematics course questions at scale which represents a milestone for higher education.

“What Is the Point of Computers? A Question for Pure Mathematicians”, Buzzard 2021

“What is the point of computers? A question for pure mathematicians”⁠, Kevin Buzzard (2021-12-22; ⁠, ):

We discuss the idea that computers might soon help mathematicians to prove theorems in areas where they have not previously been useful. Furthermore we argue that these same computer tools will also help us in the communication and teaching of mathematics.

“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Rae et al 2021

“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”⁠, Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides et al (2021-12-08; ⁠, ; backlinks; similar):

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales—from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model’s behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

“Training Verifiers to Solve Math Word Problems”, Cobbe et al 2021

“Training Verifiers to Solve Math Word Problems”⁠, Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman et al (2021-10-27; ⁠, ; backlinks; similar):

[blog] State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems.

We find that even the largest transformer models fail to achieve high test performance, despite the conceptual simplicity of this problem distribution. To increase performance, we propose training verifiers to judge the correctness of model completions. At test time, we generate many candidate solutions and select the one ranked highest by the verifier.

We demonstrate that verification substantially improves performance on GSM8K, and we provide strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.

“Is This the Simplest (and Most Surprising) Sorting Algorithm Ever?”, Fung 2021

“Is this the simplest (and most surprising) sorting algorithm ever?”⁠, Stanley P. Y. Fung (2021-10-03; ; similar):

[animation⁠; video] We present an extremely simple sorting algorithm. It may look like it is obviously wrong, but we prove that it is in fact correct. We compare it with other simple sorting algorithms [it is a kind of insertion sort], and analyse some of its curious properties.

Algorithm 1: ICan’tBelieveItCanSort (A[1..n]):

  for i = 1 to n do for j = 1 to n do if A[i] < A[j] then swap A[i] A[j]  

…Algorithm 1 was found when the author was making up some wrong sorting algorithms.

“Basins With Tentacles”, Zhang & Strogatz 2021

“Basins with tentacles”⁠, Yuanzhao Zhang, Steven H. Strogatz (2021-06-10; similar):

To explore basin geometry in high-dimensional dynamical systems, we consider a ring of identical Kuramoto oscillators. Many attractors coexist in this system; each is a twisted periodic orbit characterized by a winding number q, with basin size proportional to ekq2^^.

We uncover the geometry behind this size distribution and find the basins are octopus-like, with nearly all their volume in the tentacles, not the head of the octopus (the ball-like region close to the attractor).

We present a simple geometrical reason why basins with tentacles should be common in high-dimensional systems.

“Behavioral and Neuronal Representation of Numerosity Zero in the Crow”, Kirschhock et al 2021

2021-kirschhock.pdf: “Behavioral and Neuronal Representation of Numerosity Zero in the Crow”⁠, Maximilian E. Kirschhock, Helen M. Ditz, Andreas Nieder (2021-06-02; ; similar):

[media] Different species of animals can discriminate numerosity, the countable number of objects in a set. The representations of countable numerosities have been deciphered down to the level of single neurons. However, despite its importance for human number theory, a special numerical quantity, the empty set (numerosity zero), has remained largely unexplored. We explored the behavioral and neuronal representation of the empty set in carrion crows⁠.

Crows were trained to discriminate small numerosities including the empty set. Performance data showed a numerical distance effect for the empty set in one crow, suggesting that the empty set and countable numerosities are represented along the crows’ “mental number line.” Single-cell recordings in the endbrain region nidopallium caudolaterale (NCL) showed a considerable proportion of NCL neurons tuned to the preferred numerosity zero. As evidenced by neuronal distance and size effects, NCL neurons integrated the empty set in the neural number line. A subsequent neuronal population analysis using a statistical classifier approach showed that the neuronal numerical representations were predictive of the crows’ success in the task. These behavioral and neuronal data suggests that the conception of the empty set as a cognitive precursor of a zero-like number concept is not an exclusive property of the cerebral cortex of primates.

Zero as a quantitative category cannot only be implemented in the layered neocortex of primates, but also in the anatomically distinct endbrain circuitries of birds that evolved based on convergent evolution.

The conception of “nothing” as number “zero” is celebrated as one of the greatest achievements in mathematics. To explore whether precursors of zero-like concepts can be found in vertebrates with a cerebrum that anatomically differs starkly from our primate brain, we investigated this in carrion crows. We show that crows can grasp the empty set as a null numerical quantity that is mentally represented next to number one. Moreover, we show that single neurons in an associative avian cerebral region specifically respond to the empty set and show the same physiological characteristics as for countable quantities. This suggests that zero as a quantitative category can also be implemented in the anatomically distinct endbrain circuitries of birds that evolved based on convergent evolution.

[Keywords: corvid, songbird, single-neuron recordings, nidopallium caudolaterale, numbers, empty set]

“MathBERT: A Pre-Trained Model for Mathematical Formula Understanding”, Peng et al 2021

“MathBERT: A Pre-Trained Model for Mathematical Formula Understanding”⁠, Shuai Peng, Ke Yuan, Liangcai Gao, Zhi Tang (2021-05-02; ; similar):

Large-scale pre-trained models like BERT⁠, have obtained a great success in various Natural Language Processing (NLP) tasks, while it is still a challenge to adapt them to the math-related tasks. Current pre-trained models neglect the structural features and the semantic correspondence between formula and its context. To address these issues, we propose a novel pre-trained model, namely MathBERT, which is jointly trained with mathematical formulas and their corresponding contexts. In addition, in order to further capture the semantic-level structural features of formulas, a new pre-training task is designed to predict the masked formula substructures extracted from the Operator Tree (OPT), which is the semantic structural representation of formulas. We conduct various experiments on three downstream tasks to evaluate the performance of MathBERT, including mathematical information retrieval, formula topic classification and formula headline generation. Experimental results demonstrate that MathBERT significantly outperforms existing methods on all those three tasks. Moreover, we qualitatively show that this pre-trained model effectively captures the semantic-level structural information of formulas. To the best of our knowledge, MathBERT is the first pre-trained model for mathematical formula understanding.

“Constructions in Combinatorics via Neural Networks”, Wagner 2021

“Constructions in combinatorics via neural networks”⁠, Adam Zsolt Wagner (2021-04-29; ; similar):

We demonstrate how by using a reinforcement learning algorithm, the deep cross-entropy method, one can find explicit constructions and counterexamples to several open conjectures in extremal combinatorics and graph theory.

Amongst the conjectures we refute are a question of Brualdi and Cao about maximizing permanents of pattern avoiding matrices, and several problems related to the adjacency and distance eigenvalues of graphs.

“Are NLP Models Really Able to Solve Simple Math Word Problems?”, Patel et al 2021

“Are NLP Models really able to Solve Simple Math Word Problems?”⁠, Arkil Patel, Satwik Bhattamishra, Navin Goyal (2021-03-12; ; backlinks; similar):

The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered “solved” with the bulk of research attention moving to more complex MWPs.

In this paper, we restrict our attention to English MWPs taught in grades four and lower. We provide strong evidence that the existing MWP solvers rely on shallow heuristics to achieve high performance on the benchmark datasets. To this end, we show that MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs. Similarly, models that treat MWPs as bag-of-words can also achieve surprisingly high accuracy.

Further, we introduce a challenge dataset, SVAMP, created by applying carefully chosen variations over examples sampled from existing datasets.

The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP, thus showing that much remains to be done even for the simplest of the MWPs.

“How the Slowest Computer Programs Illuminate Math’s Fundamental Limits: The Goal of the 'busy Beaver' Game Is to Find the Longest-running Computer Program. Its Pursuit Has Surprising Connections to Some of the Most Profound Questions and Concepts in Mathematics”, Pavlus 2020

“How the Slowest Computer Programs Illuminate Math’s Fundamental Limits: The goal of the 'busy beaver' game is to find the longest-running computer program. Its pursuit has surprising connections to some of the most profound questions and concepts in mathematics”⁠, John Pavlus (2020-12-10; ⁠, ; similar):

“In math, there is a very permeable boundary between what’s an amusing recreation and what is actually important”, said Scott Aaronson⁠, a theoretical computer scientist at the University of Texas, Austin who recently published a survey of progress in “BusyBeaverology.” The recent work suggests that the search for long-running computer programs can illuminate the state of mathematical knowledge, and even tell us what’s knowable. According to researchers, the busy beaver game provides a concrete benchmark for evaluating the difficulty of certain problems, such as the unsolved Goldbach conjecture and Riemann hypothesis⁠. It even offers a glimpse of where the logical bedrock underlying math breaks down. The logician Kurt Gödel proved the existence of such mathematical terra incognita nearly a century ago. But the busy beaver game can show where it actually lies on a number line, like an ancient map depicting the edge of the world.

…For instance, if you’re only allowed one rule, and you want to ensure that the Turing machine halts, you’re forced to include the halt instruction right away. The busy beaver number of an one-rule machine, or BB(1), is therefore 1. But adding just a few more rules instantly blows up the number of machines to consider. Of 6,561 possible machines with two rules, the one that runs the longest—six steps—before halting is the busy beaver. But some others simply run forever. None of these are the busy beaver, but how do you definitively rule them out? Turing proved that there’s no way to automatically tell whether a machine that runs for a thousand or a million steps won’t eventually terminate.

That’s why finding busy beavers is so hard. There’s no general approach for identifying the longest-running Turing machines with an arbitrary number of instructions; you have to puzzle out the specifics of each case on its own. In other words, the busy beaver game is, in general, “uncomputable.” Proving that BB(2) = 6 and that BB(3) = 21 was difficult enough that Radó’s student Shen Lin earned a doctorate for the work in 1965. Radó considered BB(4) “entirely hopeless”, but the case was finally solved in 1983⁠. Beyond that, the values virtually explode; researchers have identified a five-rule Turing machine, for instance, that runs for 47,176,870 steps before stopping, so BB(5) is at least that big. BB(6) is at least 7.4 × 1036,534. Proving the exact values “will need new ideas and new insights, if it can be done at all”, said Aaronson.

…The Goldbach conjecture, for instance, asks whether every even integer greater than 2 is the sum of two primes. Proving the conjecture true or false would be an epochal event in number theory, allowing mathematicians to better understand the distribution of prime numbers. In 2015, an anonymous GitHub user named Code Golf Addict published code for a 27-rule Turing machine that halts if—and only if—the Goldbach conjecture is false. It works by counting upward through all even integers greater than 4; for each one, it grinds through all the possible ways to get that integer by adding two others, checking whether the pair is prime. When it finds a suitable pair of primes, it moves up to the next even integer and repeats the process. If it finds an even integer that can’t be summed by a pair of prime numbers, it halts. Running this mindless machine isn’t a practical way to solve the conjecture, because we can’t know if it will ever halt until it does. But the busy beaver game sheds some light on the problem. If it were possible to compute BB(27), that would provide a ceiling on how long we’d have to wait for the Goldbach conjecture to be settled automatically. That’s because BB(27) corresponds to the maximum number of steps this 27-rule Turing machine would have to execute in order to halt (if it ever did). If we knew that number, we could run the Turing machine for exactly that many steps. If it halted by that point, we’d know the Goldbach conjecture was false. But if it went that many steps and didn’t halt, we’d know for certain that it never would—thus proving the conjecture true…In 2016, Aaronson established a similar result in collaboration with Yuri Matiyasevich and Stefan O’Rear. They identified a 744-rule Turing machine that halts if and only if the Riemann hypothesis is false

…In 2016, he and his graduate student Adam Yedidia specified a 7,910-rule Turing machine that would only halt if ZF set theory is inconsistent. This means BB(7,910) is a calculation that eludes the axioms of ZF set theory. Those axioms can’t be used to prove that BB(7,910) represents one number instead of another, which is like not being able to prove that 2 + 2 = 4 instead of 5…“So much of math can be encoded as a question of, ‘Does this Turing machine halt or not?’” Aaronson said. “If you knew all the busy beaver numbers, then you could settle all of those questions.”

“Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020

“Generative Language Modeling for Automated Theorem Proving”⁠, Stanislas Polu, Ilya Sutskever (2020-09-07; ⁠, ; backlinks; similar):

We explore the application of transformer-based language models to automated theorem proving. This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans—the generation of original mathematical terms—might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance. GPT-f found new short proofs that were accepted into the main Metamath library, which is to our knowledge, the first time a deep-learning based system has contributed proofs that were adopted by a formal mathematics community. [Also notable: the benefits of pretraining on Arxiv etc⁠, despite likely including no or only redundant Metamath, and primarily natural language text, showing transfer learning of general math knowledge to abstract low-level formal proof language. See also: “PACT: Proof Artifact Co-training for Theorem Proving with Language Models”⁠, lean-gptf (for Lean), “SymbolicGPT: A Generative Transformer Model for Symbolic Regression”⁠, “Measuring Mathematical Problem Solving With the MATH Dataset”⁠/​“Measuring Coding Challenge Competence With APPS”⁠, “Learning to Prove Theorems by Learning to Generate Theorems”⁠, “TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning”]

“Mastering Chess and Shogi by Self-Play With a General Reinforcement Learning Algorithm”, Silver et al 2017

“Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”⁠, David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot et al (2017-12-05; ; backlinks; similar):

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play.

In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.

“Startup Ideas”, Branwen 2017

Startup-ideas: “Startup Ideas”⁠, Gwern Branwen (2017-08-21; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Proposals for new technologies, businesses, startups, satirical or serious.

There are no good websites for learning how to lipread, despite widespread hearing impairment and the aging of the US population. Looking at some numbers, it seems like a potentially profitable niche?

[Now exists as⁠.]

“Solving General Arithmetic Word Problems”, Roy & Roth 2016

“Solving General Arithmetic Word Problems”⁠, Subhro Roy, Dan Roth (2016-08-04; ; backlinks; similar):

This paper presents a novel approach to automatically solving arithmetic word problems.

This is the first algorithmic approach that can handle arithmetic problems with multiple steps and operations, without depending on additional annotations or predefined templates. We develop a theory for expression trees that can be used to represent and evaluate the target arithmetic expressions; we use it to uniquely decompose the target arithmetic problem to multiple classification problems; we then compose an expression tree, combining these with world knowledge through a constrained inference framework.

Our classifiers gain from the use of quantity schemas that supports better extraction of features.

Experimental results show that our method outperforms existing systems, achieving state of the art performance on benchmark datasets of arithmetic word problems.

“Too Good to Be True: When Overwhelming Evidence Fails to Convince”, Gunn et al 2016

“Too good to be true: when overwhelming evidence fails to convince”⁠, Lachlan J. Gunn, François Chapeau-Blondeau, Mark McDonnell, Bruce Davis, Andrew Allison, Derek Abbott et al (2016-01-05; backlinks; similar):

Is it possible for a large sequence of measurements or observations, which support a hypothesis, to counterintuitively decrease our confidence? Can unanimous support be too good to be true? The assumption of independence is often made in good faith, however rarely is consideration given to whether a systemic failure has occurred.

Taking this into account can cause certainty in a hypothesis to decrease as the evidence for it becomes apparently stronger. We perform a probabilistic Bayesian analysis of this effect with examples based on (i) archaeological evidence, (ii) weighing of legal evidence, and (iii) cryptographic primality testing.

We find that even with surprisingly low systemic failure rates high confidence is very difficult to achieve and in particular we find that certain analyses of cryptographically-important numerical tests are highly optimistic, underestimating their false-negative rate by as much as a factor of 2{80}.

“Random Gradient-Free Minimization of Convex Functions”, Nesterov & Spokoiny 2015

2015-nesterov.pdf: “Random Gradient-Free Minimization of Convex Functions”⁠, Yurii Nesterov, Vladimir Spokoiny (2015-11-30; ; backlinks; similar):

In this paper, we prove new complexity bounds for methods of convex optimization based only on computation of the function value.

The search directions of our schemes are normally distributed random Gaussian vectors. It appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables. This conclusion is true for both nonsmooth and smooth problems. For the latter class, we present also an accelerated scheme with the expected rate of convergence 𝒪(n2k2), where k is the iteration counter. For stochastic optimization, we propose a zero-order scheme and justify its expected rate of convergence 𝒪(nk1⁄2). We give also some bounds for the rate of convergence of the random gradient-free methods to stationary points of nonconvex functions, for both smooth and nonsmooth cases.

Our theoretical results are supported by preliminary computational experiments. [see also evolution strategies]

“Mathematics in the Age of the Turing Machine”, Hales 2013

“Mathematics in the Age of the Turing Machine”⁠, Thomas Hales (2013-02-12; backlinks):

The article gives a survey of mathematical proofs that rely on computer calculations and formal proofs.

“One Man’s Modus Ponens”, Branwen 2012

Modus: “One Man’s Modus Ponens”⁠, Gwern Branwen (2012-05-01; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

One man’s modus ponens is another man’s modus tollens is a saying in Western philosophy encapsulating a common response to a logical proof which generalizes the reductio ad absurdum and consists of rejecting a premise based on an implied conclusion. I explain it in more detail, provide examples, and a Bayesian gloss.

A logically-valid argument which takes the form of a modus ponens may be interpreted in several ways; a major one is to interpret it as a kind of reductio ad absurdum, where by ‘proving’ a conclusion believed to be false, one might instead take it as a modus tollens which proves that one of the premises is false. This “Moorean shift” is aphorized as the snowclone⁠, “One man’s modus ponens is another man’s modus tollens”.

The Moorean shift is a powerful counter-argument which has been deployed against many skeptical & metaphysical claims in philosophy, where often the conclusion is extremely unlikely and little evidence can be provided for the premises used in the proofs; and it is relevant to many other debates, particularly methodological ones.

“On the Distribution of Time-to-proof of Mathematical Conjectures”, Hisano & Sornette 2012

“On the distribution of time-to-proof of mathematical conjectures”⁠, Ryohei Hisano, Didier Sornette (2012-02-17; backlinks; similar):

What is the productivity of Science? Can we measure an evolution of the production of mathematicians over history? Can we predict the waiting time till the proof of a challenging conjecture such as the P-versus-NP problem? Motivated by these questions, we revisit a suggestion published recently and debated in the “New Scientist” that the historical distribution of time-to-proof’s, i.e., of waiting times between formulation of a mathematical conjecture and its proof, can be quantified and gives meaningful insights in the future development of still open conjectures. We find however evidence that the mathematical process of creation is too much non-stationary, with too little data and constraints, to allow for a meaningful conclusion. In particular, the approximate unsteady exponential growth of human population, and arguably that of mathematicians, essentially hides the true distribution. Another issue is the incompleteness of the dataset available. In conclusion we cannot really reject the simplest model of an exponential rate of conjecture proof with a rate of 0.01/​year for the dataset that we have studied, translating into an average waiting time to proof of 100 years. We hope that the presented methodology, combining the mathematics of recurrent processes, linking proved and still open conjectures, with different empirical constraints, will be useful for other similar investigations probing the productivity associated with mankind growth and creativity.

“Vividness in Mathematics and Narrative”, Gowers 2012-page-11

2012-gowers.pdf#page=11: “Vividness in Mathematics and Narrative”⁠, Timothy Gowers (2012-01-01; backlinks)

“How to Write a 21st Century Proof”, Lamport 2011

“How to Write a 21st Century Proof”⁠, Leslie Lamport (2011-11; backlinks; similar):

I was invited to give a talk at a celebration of the 80th birthday of Richard Palais. It was at a celebration of his 60th birthday that I first gave a talk about how to write a proof—a talk that led to [101]. So, I thought it would be fun to give the same talk, updated to reflect my 20 years of experience writing structured proofs. The talk was received much more calmly than my earlier one, and the mathematicians were open to considering that I might have something interesting to say about writing proofs. Perhaps in the last 20 years I have learned to be more persuasive, or perhaps the mathematicians in the audience had just grown older and calmer. In any case, they were still not ready to try changing how they write their own proofs.

My experience preparing and giving the talk made me realize it was time for a new paper on the subject. This paper is built around a simple example—a lemma from Michael Spivak’s calculus text. I tried to show how a mathematician can easily transform the proofs she now writes into structured proofs. The paper also briefly describes how formal structured proofs are written in TLA+, and an appendix contains a machine-checked proof of Spivak’s lemma. While mathematicians will not write formal proofs in the foreseeable future, I argue that learning how to write them is a good way to learn how to write rigorous informal proofs.

“Fermi Problem: Power Developed at the Eruption of the Puyehue-Cordón Caulle Volcanic System in June 2011”, Asorey & Dávalos 2011

“Fermi Problem: Power developed at the eruption of the Puyehue-Cordón Caulle volcanic system in June 2011”⁠, Hernan Asorey, Arturo López Dávalos (2011-09-06; backlinks; similar):

On June 4 2011 the Puyehue-Cordón Caulle volcanic system produced a pyroclastic subplinian eruption reaching level 3 in the volcanic explosivity index. The first stage of the eruption released sand and ashes that affected small towns and cities in the surrounding areas, including San Carlos de Bariloche, in Argentina, one of the largest cities in the North Patagonian andean region. By treating the eruption as a Fermi problem⁠, we estimated the volume and mass of sand ejected as well as the energy and power released during the eruptive phase. We then put the results in context by comparing the obtained values with everyday quantities, like the load of a cargo truck or the electric power produced in Argentina. These calculations have been done as a pedagogic exercise, and after evaluation of the hypothesis was done in the classroom, the calculations have been performed by the students. These are students of the first physics course at the Physics and Chemistry Teacher Programs of the Universidad Nacional de Rı́o Negro.

“Silk Road 1: Theory & Practice”, Branwen 2011

Silk-Road: “Silk Road 1: Theory & Practice”⁠, Gwern Branwen (2011-07-11; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

History, background, visiting, ordering, using, & analyzing the drug market Silk Road 1

The cypherpunk movement laid the ideological roots of Bitcoin and the online drug market Silk Road; balancing previous emphasis on cryptography, I emphasize the non-cryptographic market aspects of Silk Road which is rooted in cypherpunk economic reasoning, and give a fully detailed account of how a buyer might use market information to rationally buy, and finish by discussing strengths and weaknesses of Silk Road, and what future developments are predicted by cypherpunk ideas.

“The Replication Crisis: Flaws in Mainstream Science”, Branwen 2010

Replication: “The Replication Crisis: Flaws in Mainstream Science”⁠, Gwern Branwen (2010-10-27; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

2013 discussion of how systemic biases in science, particularly medicine and psychology, have resulted in a research literature filled with false positives and exaggerated effects, called ‘the Replication Crisis’.

Long-standing problems in standard scientific methodology have exploded as the “Replication Crisis”: the discovery that many results in fields as diverse as psychology, economics, medicine, biology, and sociology are in fact false or quantitatively highly inaccurately measured. I cover here a handful of the issues and publications on this large, important, and rapidly developing topic up to about 2013, at which point the Replication Crisis became too large a topic to cover more than cursorily. (A compilation of some additional links are provided for post-2013 developments.)

The crisis is caused by methods & publishing procedures which interpret random noise as important results, far too small datasets, selective analysis by an analyst trying to reach expected/​desired results, publication bias, poor implementation of existing best-practices, nontrivial levels of research fraud, software errors, philosophical beliefs among researchers that false positives are acceptable, neglect of known confounding like genetics, and skewed incentives (financial & professional) to publish ‘hot’ results.

Thus, any individual piece of research typically establishes little. Scientific validation comes not from small p-values, but from discovering a regular feature of the world which disinterested third parties can discover with straightforward research done independently on new data with new procedures—replication.

“Miscellaneous”, Branwen 2009

Notes: “Miscellaneous”⁠, Gwern Branwen (2009-08-05; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Misc thoughts, memories, proto-essays, musings, etc.

We usually clean up after ourselves, but sometimes, we are expected to clean before (ie. after others) instead. Why?

Because in those cases, pre-cleanup is the same amount of work, but game-theoretically better whenever a failure of post-cleanup would cause the next person problems.

“Desperately Seeking Mathematical Proof”, Nathanson 2009

“Desperately seeking mathematical proof”⁠, Melvyn B. Nathanson (2009-05-22; backlinks):

Remarks on mathematical proof and the practice of mathematics.

“Modafinil”, Branwen 2009

Modafinil: “Modafinil”⁠, Gwern Branwen (2009-02-20; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Effects, health concerns, suppliers, prices & rational ordering.

Modafinil is a prescription stimulant drug. I discuss informally, from a cost-benefit-informed perspective, the research up to 2015 on modafinil’s cognitive effects, the risks of side-effects and addiction/​tolerance and law enforcement, and give a table of current grey-market suppliers and discuss how to order from them.

“Probing the Improbable: Methodological Challenges for Risks With Low Probabilities and High Stakes”, Ord et al 2008

“Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes”⁠, Toby Ord, Rafaela Hillerbrand, Anders Sandberg (2008-10-30; ; backlinks; similar):

Some risks have extremely high stakes. For example, a worldwide pandemic or asteroid impact could potentially kill more than a billion people. Comfortingly, scientific calculations often put very low probabilities on the occurrence of such catastrophes. In this paper, we argue that there are important new methodological problems which arise when assessing global catastrophic risks and we focus on a problem regarding probability estimation.

When an expert provides a calculation of the probability of an outcome, they are really providing the probability of the outcome occurring, given that their argument is watertight. However, their argument may fail for a number of reasons such as a flaw in the underlying theory, a flaw in the modeling of the problem, or a mistake in the calculations. If the probability estimate given by an argument is dwarfed by the chance that the argument itself is flawed, then the estimate is suspect.

We develop this idea formally, explaining how it differs from the related distinctions of model and parameter uncertainty. Using the risk estimates from the Large Hadron Collider as a test case, we show how serious the problem can be when it comes to catastrophic risks and how best to address it.

“The Epic Story of Maximum Likelihood”, Stigler 2007

“The Epic Story of Maximum Likelihood”⁠, Stephen M. Stigler (2007-11; backlinks; similar):

At a superficial level, the idea of maximum likelihood must be prehistoric: early hunters and gatherers may not have used the words “method of maximum likelihood” to describe their choice of where and how to hunt and gather, but it is hard to believe they would have been surprised if their method had been described in those terms. It seems a simple, even unassailable idea: Who would rise to argue in favor of a method of minimum likelihood, or even mediocre likelihood? And yet the mathematical history of the topic shows this “simple idea” is really anything but simple. Joseph Louis Lagrange, Daniel Bernoulli, Leonard Euler, Pierre Simon Laplace and Carl Friedrich Gauss are only some of those who explored the topic, not always in ways we would sanction today. In this article, that history is reviewed from back well before Fisher to the time of Lucien Le Cam’s dissertation. In the process Fisher’s unpublished 1930 characterization of conditions for the consistency and efficiency of maximum likelihood estimates is presented, and the mathematical basis of his three proofs discussed. In particular, Fisher’s derivation of the information inequality is seen to be derived from his work on the analysis of variance⁠, and his later approach via estimating functions was derived from Euler’s Relation for homogeneous functions. The reaction to Fisher’s work is reviewed, and some lessons drawn.

[Keywords: R. A. Fisher⁠, Karl Pearson, Jerzy Neyman, Harold Hotelling, Abraham Wald⁠, maximum likelihood, sufficiency, efficiency, superefficiency, history of statistics]

“Overhang”, Paterson & Zwick 2007

“Overhang”⁠, Mike Paterson, Uri Zwick (2007-10-12; backlinks):

How far off the edge of the table can we reach by stacking n identical, homogeneous, frictionless blocks of length 1?

A classical solution achieves an overhang of 1⁄2Hn, where Hn ln n is the nth harmonic number⁠. This solution is widely believed to be optimal.

We show, however, that it is, in fact, exponentially far from optimality by constructing simple n-block stacks that achieve an overhang of cn1⁄3, for some constant c > 0.

“The Monotype 4-Line System for Setting Mathematics”, Rhatigan 2007

2007-rhatigan.pdf: “The Monotype 4-Line System for Setting Mathematics”⁠, Daniel Rhatigan (2007-08-13; ; similar):

[Description of the most advanced mechanical typesetting system for the challenging task of typesetting mathematics (which high-quality typography is what Knuth aimed to recreate).

To provide the typographic quality of hand-set math but at an affordable cost, the Monotype corporation made a huge investment post-WWII into enhancing its mechanical hot metal typesetting system into one which would encode every mathematical equation into symbols placed on a vertical grid of 4 horizontal ‘lines’, into which could be slotted entries from a vast new family of fonts & symbols, all tweaked to fit in various positions, which would then be spat out by the machine into a single solid lead piece which could be combined with the rest to form a single page.

This allowed a skilled operator to rapidly ‘type’ his way through a page of math to yield a beautiful custom output without endlessly tedious hand-arranging lots of little metal bits.]

“Maximum Overhang”, Paterson et al 2007

“Maximum overhang”⁠, Mike Paterson, Yuval Peres, Mikkel Thorup, Peter Winkler, Uri Zwick (2007-07-01; backlinks):

How far can a stack of n identical blocks be made to hang over the edge of a table? The question dates back to at least the middle of the 19th century and the answer to it was widely believed to be of order log n.

Recently, Paterson & Zwick constructed n-block stacks with overhangs of order n1⁄3, exponentially better than previously thought possible.

We show here that order n1⁄3is indeed best possible, resolving the long-standing overhang problem up to a constant factor.

“Béla Bollobás: Graphs Extremal and Random [Interview of Béla Bollobás by Y. K. Leong]”, Leong & Bollobás 2007

2007-leong.pdf: “Béla Bollobás: Graphs Extremal and Random [Interview of Béla Bollobás by Y. K. Leong]”⁠, Y. K. Leong, Béla Bollobás (2007-01-01)

“Computational Discovery in Pure Mathematics”, Colton 2007

2007-colton.pdf: “Computational Discovery in Pure Mathematics”⁠, Simon Colton (2007; backlinks; similar):

We discuss what constitutes knowledge in pure mathematics and how new advances are made and communicated. We describe the impact of computer algebra systems, automated theorem provers, programs designed to generate examples, mathematical databases, and theory formation programs on the body of knowledge in pure mathematics. We discuss to what extent the output from certain programs can be considered a discovery in pure mathematics. This enables us to assess the state of the art with respect to Newell and Simon’s prediction that a computer would discover and prove an important mathematical theorem.

“EWD1300: The Notational Conventions I Adopted, and Why”, Dijkstra 2002

2002-dijkstra.pdf: “EWD1300: The Notational Conventions I Adopted, and Why”⁠, Edsger W. Dijkstra (2002-12-01; similar):

At a given moment, the concept of polite mathematics emerged, the underlying idea of which is that, even if you have only 60 readers, it pays to spend an hour if by doing so you can save your average reader a minute. By inventing an idealized ‘average reader’, we could translate most of the lofty, human goal of politeness into more or less formal criteria we could apply to our texts. This note is devoted to the resulting notational and stylistic conventions that were adopted as the years went by.

We don’t want to baffle or puzzle our reader, in particular it should be clear what has to be done to check our argument and it should be possible to do so without pencil and paper. This dictates small, explicit steps. On the other hand it is well known that brevity is the leading characteristic of mathematical elegance, and some fear that this ideal excludes the small, explicit steps, but one of the joys of my professional life has been the discovery that this fear is unfounded, for brevity can be achieved without committing the sin of omission.

I should point out that my ideal of crisp clarity is not universally shared. Some consider the puzzles that are created by their omissions as spicy challenges, without which their texts would be boring; others shun clarity lest their work is considered trivial.

“Hymne to Hymen”, Descartes & Smith 2002

2002-descartes.pdf: “Hymne to Hymen”⁠, B. Descartes, C. A. B. Smith (2002-01-01)

“Making Mathematics: The Coffee Connection”, Wieschenberg 1999

1999-wieschenberg.pdf: “Making Mathematics: The Coffee Connection”⁠, Agnes Arvai Wieschenberg (1999-06-01; ; backlinks)

“How Did Software Get so Reliable without Proof?”, Hoare 1996

1996-hoare.pdf: “How did software get so reliable without proof?”⁠, C. A. R. Hoare (1996-03; backlinks; similar):

By surveying current software engineering practice, this paper reveals that the techniques employed to achieve reliability are little different from those which have proved effective in all other branches of modern engineering: rigorous management of procedures for design inspection and review; quality assurance based on a wide range of targeted tests; continuous evolution by removal of errors from products already in widespread use; and defensive programming, among other forms of deliberate over-engineering. Formal methods and proof play a small direct role in large scale programming; but they do provide a conceptual framework and basic understanding to promote the best of current practice, and point directions for future improvement.

“A Visit to Hungarian Mathematics”, Hersh & John-Steiner 1993

1993-hersh.pdf: “A visit to Hungarian mathematics”⁠, Reuben Hersh, Vera John-Steiner (1993-01-01)

Envisioning Information: Chapter 5, ‘Color and Information’, Pg83-86 [on Oliver Byrne's Color Diagram Version of Euclid's Elements]”, Tufte 1990

1990-tufte-envisioninginformation-ch5-byrneseuclid.pdf: Envisioning Information: chapter 5, ‘Color and Information’, pg83-86 [on Oliver Byrne's color diagram version of Euclid's Elements]”⁠, Edward Tufte (1990; ; backlinks; similar):

[Extracts from Tufte textbook on graphing information and visual design, where he revives & popularizes Oliver Byrne’s obscure edition of Euclid⁠.

Tufte notes how effectively Bryne converts lengthy formal text proofs (intended for recitation) into short sequences of cleanly-designed diagrams exploiting primary colors for legibility, and the curious anticipation of modernist design movements like De Stijl⁠.

This inspired 2 digital recreations by Slyusarev Sergey & Nicholas Rougeux⁠.]

“Dynamical Systems That Sort Lists, Diagonalize Matrices and Solve Linear Programming Problems”, Brockett 1988

1988-brockett.pdf: “Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems”⁠, R. W. Brockett (1988-12-07; similar):

We establish a number of properties associated with the dynamical system ̇H = [H, [H, N]], where H and N are symmetric n by n matrices and [A, B] = AB − BA. The most important of these come from the fact that this equation is equivalent to a certain gradient flow on the space of orthogonal matrices.

Particular emphasis is placed on the role of this equation as an analog computer. For example, it is shown how to map the data associated with a linear programming problem into H(0) and N in such a way as to have ̇H = [H[H, N]] evolve to a solution of the linear programming problem.

This result can be applied to find systems that solve a variety of generic combinatorial optimization problems, and it also provides an algorithm for diagonalizing symmetric matrices.

“FRACTRAN: A Simple Universal Programming Language for Arithmetic”, Conway 1987

1987-conway.pdf: “FRACTRAN: A Simple Universal Programming Language for Arithmetic”⁠, John H. Conway (1987; ; backlinks; similar):

To play the fraction game corresponding to a given list

f1, f2, …, fk

of fractions and starting integer N, you repeatedly multiply the integer you have at any stage (initially N) by the earliest fi in the list for which the answer is integral. Whenever there is no such fi, the game stops.

“Review of Manin Yu. I.. _A Course in Mathematical Logic_”, Boolos 1986

1986-boolos.pdf: “Review of Manin Yu. I.. _A course in mathematical logic_”⁠, George Boolos (1986-01-01; backlinks)

“Discrete Hartley Transform”, Bracewell 1983

1983-bracewell.pdf: “Discrete Hartley transform”⁠, R. N. Bracewell (1983-12-01; similar):

The discrete Hartley transform (DHT) resembles the discrete Fourier transform (DFT) but is free from 2 characteristics of the DFT that are sometimes computationally undesirable.

The inverse DHT is identical with the direct transform, and so it is not necessary to keep track of the +i and −i versions as with the DFT. Also, the DHT has real rather than complex values and thus does not require provision for complex arithmetic or separately managed storage for real and imaginary parts. Nevertheless, the DFT is directly obtainable from the DHT by a simple additive operation.

In most image-processing applications the convolution of 2 data sequences f1 and f2 is given by DHT of [(DHT of f1) × (DHT of f2)], which is a rather simpler algorithm than the DFT permits, especially if images are to be manipulated in 2 dimensions. It permits faster computing. Since the speed of the fast Fourier transform depends on the number of multiplications, and since one complex multiplication equals 4 real multiplications, a fast Hartley transform also promises to speed up Fourier-transform calculations.

The name “discrete Hartley transform” is proposed because the DHT bears the same relation to an integral transform described by Hartley [R. V. L. Hartley, Proc. IRE 30, 144 (1942)] as the DFT bears to the Fourier transform.

“Are Impossible Figures Possible?”, Kulpa 1983

1983-kulpa.pdf: “Are impossible figures possible?”⁠, Zenon Kulpa (1983-05-01; ; similar):

In the paper, a thorough analysis of the so-called impossible figures phenomenon is attempted. The notion of an impossible figure and some other related phenomena (eg. ‘likely’ and ‘unlikely’ figures) are precisely defined and analyzed.

It is shown that all these figures, being illusions of spatial interpretation of pictures, are more relevant to psychology of vision (and related artificial intelligence research) than to geometry or mathematics in general. It suggests an inadequacy of several previous formal approaches to explain these phenomena and to deal with them in computer vision programs.

The analysis of these spatial interpretation illusions allows us to formulate several properties of the structure of our spatial interpretation mechanism. A 2-stage structure of this mechanism, a set of basic ‘interpretation assumptions’ and a set of basic ‘impossibility causes’ are identified as a result.

[Keywords: impossible figures, visual illusions, spatial (3-D) interpretation of pictures, computer vision]

“Heaviside’s Operational Calculus and the Attempts to Rigorise It”, Lützen 1979

1979-lutzen.pdf: “Heaviside’s Operational Calculus and the Attempts to Rigorise It”⁠, Jesper Lützen (1979-01-01)

“Some Proposals for Reviving the Philosophy of Mathematics”, Hersh 1979

1979-hersh.pdf: “Some Proposals for Reviving the Philosophy of mathematics”⁠, Reuben Hersh (1979-01-01)

“Social Processes and Proofs of Theorems and Programs”, Millo et al 1979

1979-demillo.pdf: “Social Processes and Proofs of Theorems and Programs”⁠, Richard A. De Millo, Richard J. Lipton, Alan J. Perlis (1979; backlinks; similar):

Many people have argued that computer programming should strive to become more like mathematics. Maybe so, but not in the way they seem to think. The aim of program verification, an attempt to make programming more mathematics-like, is to increase dramatically one’s confidence in the correct functioning of a piece of software, and the device that verifiers use to achieve this goal is a long chain of formal, deductive logic. In mathematics, the aim is to increase one’s confidence in the correctness of a theorem, and it’s true that one of the devices mathematicians could in theory use to achieve this goal is a long chain of formal logic. But in fact they don’t. What they use is a proof, a very different animal. Nor does the proof settle the matter; contrary to what its name suggests, a proof is only one step in the direction of confidence. We believe that, in the end, it is a social process that determines whether mathematicians feel confident about a theorem—and we believe that, because no comparable social process can take place among program verifiers, program verification is bound to fail. We can’t see how it’s going to be able to affect anyone’s confidence about programs.

“Life at Low Reynolds Number”, Purcell 1977

1977-purcell.pdf: “Life at low Reynolds number”⁠, E. M. Purcell (1977; ⁠, ; similar):

[Physics discussion by Edward Mills Purcell prompted by Victor Weisskopf⁠, on the topic of fluid mechanics & viscous liquids with low Reynolds number⁠, particularly for microorganisms.

At the small scale, things become strange and any motion is difficult: inertia is irrelevant, as it is proportional to a man swimming in a pool of molasses who can move 1 centimeter a minute.

Worse, only certain kinds of motion will result in any progress: a symmetrical or “reciprocal motion” such as a scallop opening & close will merely move in place! A scallop would need 2 hinges to go anywhere. To move requires stranger approaches, like a “flexible oar” which can bend one way and then another, or a corkscrew motion.

How do bacteria like E. coli move? With a flagellum or a cilium: a flagellum does not wiggle like everyone thought, but literally rotates on a gear (like a little machine), and so it can move in the goop that is small-scale water.

The motion is slow, and extremely inefficient, but that doesn’t matter. A bigger problem is diffusion: in a viscous medium, stirring things up around you does nothing, because the fluid won’t move much compared to normal diffusion. Your are still surrounded by your waste products & limited to what you can eat around you. Moving yourself is too slow to help either: diffusion is much faster.

So why move at all? To find greener pastures where diffusion will bring you more stuff than where you were before. And if you are going to move at all, you might as well move far enough to outrun diffusion and get a meaningful difference. This explains how and why bacteria move—it makes little sense if your intuitions are formed on the large-scale, but makes sense down there in the micro-scale of low Reynolds number fluids.]

“Randomness and Mathematical Proof”, Chatin 1975

1975-chaitin.pdf: “Randomness and Mathematical Proof”⁠, Gregory Chatin (1975-01-01; ; backlinks)

“The Dangers of Computer-Science Theory”, Knuth 1973

1973-knuth.pdf: “The Dangers of Computer-Science Theory”⁠, Donald E. Knuth (1973; ; backlinks; similar):

This chapter discusses the difficulties associated with the computer-science theories.

The theory of automata is slowly changing to a study of random-access computations, and this work promises to be more useful. Any algorithm programmable on a certain kind of pushdown automaton can be performed efficiently on a random-access machine, no matter how slowly the pushdown program runs.

Another difficulty with the theory of languages is that it has led to an overemphasis on syntax as opposed to semantics. For many years there was much light on syntax and very little on semantics; so simple semantic constructions were unnaturally grafted onto syntactic definitions, making rather unwieldy grammars, instead of searching for theories more appropriate to semantics.

Theories are often more structured and more interesting when they are based on real problems; somehow they are more exciting than completely abstract theories will ever be.

“Fidelity in Mathematical Discourse: Is One and One Really Two?”, Davis 1972

1972-davis.pdf: “Fidelity in Mathematical Discourse: Is One and One Really Two?”⁠, Philip J. Davis (1972-01-01)

“The Humble Programmer [EWD340]”, Dijkstra 1972

“The Humble Programmer [EWD340]”⁠, Edsger W. Dijkstra (1972; ; backlinks):

[ACM Turing Lecture 1972 by famously opinionated mathematician Edsger W. Dijkstra]

…I had to make up my mind, either to stop programming and become a real, respectable theoretical physicist, or to carry my study of physics to a formal completion only, with a minimum of effort, and to become…, yes what? A programmer? But was that a respectable profession? For after all, what was programming? Where was the sound body of knowledge that could support it as an intellectually respectable discipline? I remember quite vividly how I envied my hardware colleagues, who, when asked about their professional competence, could at least point out that they knew everything about vacuum tubes, amplifiers and the rest, whereas I felt that, when faced with that question, I would stand empty-handed. Full of misgivings I knocked on van Wijngaarden’s office door, asking him whether I could “speak to him for a moment”; when I left his office a number of hours later, I was another person. For after having listened to my problems patiently, he agreed that up till that moment there was not much of a programming discipline, but then he went on to explain quietly that automatic computers were here to stay, that we were just at the beginning and could not I be one of the persons called to make programming a respectable discipline in the years to come? This was a turning point in my life and I completed my study of physics formally as quickly as I could.

…To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming had become an equally gigantic problem. In this sense the electronic industry has not solved a single problem, it has only created them, it has created the problem of using its products. To put it in another way: as the power of available machines grew by a factor of more than a thousand, society’s ambition to apply these machines grew in proportion, and it was the poor programmer who found his job in this exploded field of tension between ends and means. The increased power of the hardware, together with the perhaps even more dramatic increase in its reliability, made solutions feasible that the programmer had not dared to dream about a few years before.

…Automatic computers have now been with us for a quarter of a century. They have had a great impact on our society in their capacity of tools, but in that capacity their influence will be but a ripple on the surface of our culture, compared with the much more profound influence they will have in their capacity of intellectual challenge without precedent in the cultural history of mankind.

“Assigning Probabilities to Logical Formulas”, Scott & Krauss 1966

1966-scott.pdf: “Assigning Probabilities to Logical Formulas”⁠, Dana Scott, Peter Krauss (1966; similar):

Probability concepts nowadays are presented in the standard framework of the Kolmogorov axioms⁠. A sample space is given together with an σ-field of subsets, the events, and an σ-additive probability measure defined on this σ-field.

It is more natural in many situations to assign probabilities to statements rather than sets. It may be mathematically useful to translate everything into a set-theoretical formulation, but the step is not always necessary or even helpful. The main task is to carry over the standard concepts from ordinary logic to what might be called “probability logic.” Indeed ordinary logic is a special case: the assignment of truth values to formulas can be viewed as assigning probabilities that are either 0 (for false) or 1 (for true).

In a sense, the symmetric probability systems are opposite to ordinary relational systems.

“Non-Cooperative Games”, Nash 1951

1951-nash.pdf: “Non-Cooperative Games”⁠, John Nash (1951-09-01; ⁠, ⁠, ; backlinks; similar):

…Our game theory⁠, in contradistinction, is based on the absence of coalitions in that it is assumed that each participant acts independently, without collaboration or communication with any of the others.

The notion of an equilibrium point is the basic ingredient in our theory. This notion yields a generalization of the concept of the solution of a 2-person zero-sum game. It turns out that the set of equilibrium points of a 2-person zero-sum game is simply the set of all pairs of opposing ‘good strategies’.

In the immediately following sections we shall define equilibrium points and prove that a finite non-cooperative game always has at least one equilibrium point. We shall also introduce the notions of solvability and strong solvability of a non-cooperative game and prove a theorem on the geometrical structure of the set of equilibrium points of a solvable game.

As an example of the application of our theory we include a solution of a simplified 3 person poker game.

An Essay On The Psychology Of Invention In The Mathematical Field”, Hadamard 1945

An Essay On The Psychology Of Invention In The Mathematical Field⁠, Jacques Hadamard (1945; ⁠, ; backlinks; similar):

[Relevant to an essay of mine on mathematical errorHadamard’s book is one of the classics in the area of mathematical discovery, mentioned along with Poincaré’s lecture⁠.

With due allowance for style and age, Hadamard ably describes and defends the basic model of ‘work, incubation⁠, illumination, verification’, with reference to his own discoveries, his many famous acquaintances, Poincaré’s lecture, and a very interesting survey of mathematicians. In fact, it’s a little depressing that we don’t seem to have gone much beyond that in the half-century since this was published back in 1945 or so. While at least we no longer need his defense of the unconscious as a meaningful part of cognition, much of the rest is depressingly familiar—for example, his acute observations on mental imagery & people who solely think in words, and mention of Francis Galton’s survey (little-known outside of psychology), could be usefully read by many who commit the typical mind fallacy⁠.

If Hadamard comes to no hard and fast conclusions, but merely raises many interesting points and criticizes a number of theories, we can hardly hold that against him, as we can do little better and so it becomes our failing to followup, not his.]

“A More Symmetrical Fourier Analysis Applied to Transmission Problems”, Hartley 1942

1942-hartley.pdf: “A More Symmetrical Fourier Analysis Applied to Transmission Problems”⁠, Ralph V. L. Hartley (1942-03-01; backlinks; similar):

The Fourier identity is here expressed in a more symmetrical form which leads to certain analogies between the function of the original variable and its transform. Also it permits a function of time, for example, to be analyzed into 2 independent sets of sinusoidal components, one of which is represented in terms of positive frequencies, and the other of negative.

The steady-state treatment of transmission problems in terms of this analysis is similar to the familiar ones and may be carried out either in terms of real quantities or of complex exponentials. In the transient treatment, use is made of the analogies referred to above, and their relation to the method of “paired echoes” is discussed.

A restatement is made of the condition which is known to be necessary in order that a given steady-state characteristic may represent a passive or stable active system (actual or ideal).

A particular necessary condition is deduced from this as an illustration.

“On a Problem of Formal Logic”, Ramsey 1930

1930-ramsey.pdf: “On a Problem of Formal Logic”⁠, Frank P. Ramsey (1930-01-01; ; backlinks)

“Operational Methods in Mathematical Physics”, Carslaw 1928

1928-carslaw.pdf: “Operational Methods in Mathematical Physics”⁠, H. S. Carslaw (1928-10-01; backlinks; similar):

This essay-review of Jeffreys’ very welcome and valuable Tract with the above title has been written at the editor’s request. Many readers of the Gazette must have heard of Heaviside’s operational method of solving the equations of dynamics and mathematical physics. If they have tried to learn about them from Heaviside’s own works, they have attempted a difficult task. Nothing more obscure than his mathematical writings is known to me. A Cambridge Tract is now at their disposal. Prom it much may be learned; but the air of mystery still—at least in part—remains.

“The Foundations of Mathematics”, Ramsey 1926

1926-ramsey.pdf: “The Foundations of Mathematics”⁠, Frank P. Ramsey (1926-01-01; ; backlinks)

“On Operators in Physical Mathematics. Part I”, Heaviside 1892

1892-heaviside.pdf: “On Operators in Physical Mathematics. Part I”⁠, Oliver Heaviside (1892-01-01)

Square packing in a square


Scottish Café


Scottish Book