“Why Philosophers Should Care About Computational Complexity”, (2011-08-08):
One might think that, once we know something is computable, how efficiently it can be computed is a practical question with little further philosophical importance. In this essay, I offer a detailed case that one would be wrong. In particular, I argue that computational complexity theory—the field that studies the resources (such as time, space, and randomness) needed to solve computational problems—leads to new perspectives on the nature of mathematical knowledge, the strong AI debate, computationalism, the problem of logical omniscience, Hume’s problem of induction, Goodman’s grue riddle, the foundations of quantum mechanics, economic rationality, closed timelike curves, and several other topics of philosophical interest. I end by discussing aspects of complexity theory itself that could benefit from philosophical analysis.
2012-rintanen.pdf: “Planning as satisfiability: Heuristics”, (2012-12; ):
Reduction to SAT is a very successful approach to solving hard combinatorial problems in Artificial Intelligence and computer science in general. Most commonly, problem instances reduced to SAT are solved with a general-purpose SAT solver. Although there is the obvious possibility of improving the process with application-specific heuristics, this has rarely been done successfully.
In this work we propose a planning-specific variable selection strategy for generic variable selection heuristics, such as VSIDS, and often lifts it to the same level with other search methods such as explicit state-space search with heuristic search algorithms.. The strategy is based on generic principles about properties of plans, and its performance with standard planning benchmarks often substantially improves on
1959-shannon.pdf: “Coding Theorems for a Discrete Source With a Fidelity Criterion”, (1959; ):
Consider a discrete source producing a sequence of message letters from a finite alphabet. A single-letter distortion measure is given by an non-negative matrix (dij). The entry dij measures the “cost” or “distortion” if letter i is reproduced at the receiver as letter j. The average distortion of a communications system (source-coder-noisy channel-decoder) is taken to be where Pij is the probability of i being reproduced as j. It is shown that there is a function R(d) that measures the “equivalent rate” of the source for a given level of distortion. For coding purposes where a level d of distortion can be tolerated, the source acts like one within information rate R(d). Methods are given for calculating R(d), and various properties discussed. Finally, generalizations to ergodic sources, to continuous sources, and to distortion measures involving blocks of letters are developed.
“You and Your Research”, (1986-03-07):
[Transcript of a talk by mathematician and Bell Labs manager Richard Hamming about what he had learned about computers and how to do effective research (republished in expanded form as Art of Doing Science and Engineering: Learning to Learn; 1995 video). It is one of the most famous and most-quoted such discussions ever.]
At a seminar in the Bell Communications Research Colloquia Series, Dr. Richard W. Hamming, a Professor at the Naval Postgraduate School in Monterey, California and a retired Bell Labs scientist, gave a very interesting and stimulating talk, ‘You and Your Research’ to an overflow audience of some 200 Bellcore staff members and visitors at the Morris Research and Engineering Center on March 7, 1986. This talk centered on Hamming’s observations and research on the question “Why do so few scientists make substantial contributions and so many are forgotten in the long run?” From his more than 40 years of experience, 30 of which were at Bell Laboratories, he has made a number of direct observations, asked very pointed questions of scientists about what, how, and why they did things, studied the lives of great scientists and great contributions, and has done introspection and studied theories of creativity. The talk is about what he has learned in terms of the properties of the individual scientists, their abilities, traits, working habits, attitudes, and philosophy.
“PlaNet—Photo Geolocation with Convolutional Neural Networks”, (2016-02-17):
Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model.
1973-knuth.pdf: “The Dangers of Computer-Science Theory”, (1973; ):
This chapter discusses the difficulties associated with the computer-science theories.
The theory of automata is slowly changing to a study of random-access computations, and this work promises to be more useful. Any algorithm programmable on a certain kind of pushdown automaton can be performed efficiently on a random-access machine, no matter how slowly the pushdown program runs.
Another difficulty with the theory of languages is that it has led to an overemphasis on syntax as opposed to semantics. For many years there was much light on syntax and very little on semantics; so simple semantic constructions were unnaturally grafted onto syntactic definitions, making rather unwieldy grammars, instead of searching for theories more appropriate to semantics.
Theories are often more structured and more interesting when they are based on real problems; somehow they are more exciting than completely abstract theories will ever be.
2015-hofman.pdf: “Evolution of the Human Brain: From Matter to Mind”, (2015; ):
Design principles and operational modes are explored that underlie the information processing capacity of the human brain.
The hypothesis is put forward that in higher organisms, especially in primates, the complexity of the neural circuitry of the cerebral cortex is the neural correlate of the brain’s coherence and predictive power, and, thus, a measure of intelligence. It will be argued that with the evolution of the human brain we have nearly reached the limits of biological intelligence.
[Keywords: biological intelligence, cognition, consciousness, cerebral cortex, primates, information processing, neural networks, cortical design, human brain evolution]
2016-silver.pdf#deepmind: “Mastering the game of Go with deep neural networks and tree search”, (2016-01-28; ):
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
[Anecdote: I hear from Groq that the original AlphaGo GPU implementation was not on track to defeat Lee Sedol by about a month before, when they happened to gamble on implementing TPUv1 support. The additional compute led to drastic performance gains, and the TPU model could beat the GPU model in ~98 of 100 games, and the final model solidly defeated Lee Sedol. (Since TPUv1s reportedly only did inferencing/forward-mode, presumably they were not used for the initial imitation learning, or the policy gradients self-play, but for generating the ~30 million self-play games which the value network was trained on (doing regression/prediction of ‘board → P(win)’, requiring no state or activations from the self-play games, just generating an extremely large corpus which could be easily used by training.]
1935-wechsler-rangeofhumancapacities.pdf: “The Range of Human Capacities”, (1935; ):
Edward Thorndike saw that the subjects who did well at the start of the training also improved faster as the training progressed compared with the subjects who began more slowly. “As a matter of fact”, Thorndike wrote, “in this experiment the larger individual differences increase with equal training, showing a positive correlation with high initial ability with ability to profit by training.” The passage from the Bible doesn’t quite capture Thorndike’s results accurately because every subject improved, but the rich got relatively richer. Everyone learned, but the learning rates were consistently different.
When World War I erupted, Thorndike became a member of the Committee on Classification of Personnel, a group of psychologists commissioned by the U.S. Army to evaluate recruits [see Strong 1918]. It was there that Thorndike rubbed off on a young man named David Wechsler, who had just finished his master’s degree in psychology. Wechsler, who would become a famous psychologist, developed a lifelong fascination with tracing the boundaries of humanity, from lower to upper limits.
In 1935, Wechsler compiled essentially all of the credible data in the world he could find on human measurements. He scoured measures of everything from vertical jump to the duration of pregnancies to the weight of the human liver and the speeds at which card punchers at a factory could punch their cards. He organized it all in the first edition of a book with the aptly momentous title The Range of Human Capacities.
Wechsler found that the ratio of the smallest to biggest, or best to worst, in just about any measure of humanity, from high jumping to hosiery looping [knitting], was between 2 to one and 3 to one. To Wechsler, the ratio appeared so consistent that he suggested it as a kind of universal rule of thumb.
Phillip Ackerman, a Georgia Tech psychologist and skill acquisition expert, is a sort of modern-day Wechsler, having combed the world’s skill-acquisition studies in an effort to determine whether practice makes equal, and his conclusion is that it depends on the task. In simple tasks, practice brings people closer together, but in complex ones, it often pulls them apart. Ackerman has designed computer simulations used to test air traffic controllers, and he says that people converge on a similar skill level with practice on the easy tasks—like clicking buttons to get planes to take off in order—but for the more complex simulations that are used for real-life controllers, “the individual differences go up”, he says, not down, with practice. In other words, there’s a Matthew effect on skill acquisition.
Even among simple motor skills, where practice decreases individual differences, it never drowns them entirely. “It’s true that doing more practice helps”, Ackerman says, “but there’s not a single study where variability between subjects disappears entirely.”
“If you go to the grocery store”, he continues, “you can look at the checkout clerk, who is using mostly perceptual motor skill. On average, the people who’ve been doing it for 10 years will get through 10 customers in the time the new people get across one. But the fastest person with 10 years’ experience will still be about 3 times faster than the slowest person with 10 years’ experience.”
“Assessing Human Error Against a Benchmark of Perfection”, (2016-06-15):
An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors.
To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging even for the best players in the world.
We organize our analysis around three categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are statistically-significantly more powerful than features based on skill or time.
2017-silver.pdf#page=3: “Mastering the game of Go without human knowledge”, David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis
“Why g matters: The complexity of everyday life”, (1997-01):
Personnel selection research provides much evidence that intelligence (g) is an important predictor of performance in training and on the job, especially in higher level work. This article provides evidence that g has pervasive utility in work settings because it is essentially the ability to deal with cognitive complexity, in particular, with complex information processing. The more complex a work task, the greater the advantages that higher g confers in performing it well. Everyday tasks, like job duties, also differ in their level of complexity. The importance of intelligence therefore differs systematically across different arenas of social life as well as economic endeavor. Data from the National Adult Literacy Survey are used to show how higher levels of cognitive ability systematically improve individual’’s odds of dealing successfully with the ordinary demands of modern life (such as banking, using maps and transportation schedules, reading and understanding forms, interpreting news articles). These and other data are summarized to illustrate how the advantages of higher g, even when they are small, cumulate to affect the overall life chances of individuals at different ranges of the IQ bell curve. The article concludes by suggesting ways to reduce the risks for low-IQ individuals of being left behind by an increasingly complex postindustrial economy.
“On the Impossibility of Supersized Machines”, (2017-03-31):
In recent years, a number of prominent computer scientists, along with academics in fields such as philosophy and physics, have lent credence to the notion that machines may one day become as large as humans. Many have further argued that machines could even come to exceed human size by a significant margin. However, there are at least seven distinct arguments that preclude this outcome. We show that it is not only implausible that machines will ever exceed human size, but in fact impossible.