The gender-equality paradox refers to the puzzling finding that societies with more gender equality demonstrate larger gender differences across a range of phenomena, most notably in the proportion of women who pursue degrees in science, technology, engineering, and math. The present investigation demonstrates across two different measures of gender equality that this paradox extends to chess participation (n = 803,485 across 160 countries; age range: 3–100 years), specifically that women participate more often in countries with less gender equality. Previous explanations for the paradox fail to account for this finding. Instead, consistent with the notion that gender equality reflects a generational shift, mediation analyses suggest that the gender-equality paradox in chess is driven by the greater participation of younger players in countries with less gender equality. A curvilinear effect of gender equality on the participation of female players was also found, demonstrating that gender differences in chess participation are largest at the highest and lowest ends of the gender-equality spectrum.
During the COVID-19 pandemic, traditional (offline) chess tournaments were prohibited and instead held online.
We exploit this unique setting to assess the impact of remote work policies on the cognitive performance of individuals. Using the artificial intelligence embodied in a powerful chess engine [Stockfish 11] to assess the quality of chess moves and associated errors:
we find a statistically-significant & economically-important decrease in performance when an individual competes remotely versus offline in a face-to-face setting.
The effect size decreases over time, suggesting an adaptation to the new remote setting.
…During the COVID-19 pandemic, the current chess world champion, Magnus Carlsen, initiated an online tournament series, the Magnus Carlsen Chess Tour. We analyse the performance of players who have participated in these online tournaments and the performance of players participating in recent events of the World Rapid Chess Championship as organized by the World Chess Federation in a traditional offline format. In particular, our main comparison is based on 20 elite chess players who competed both in the online and offline tournaments. We selected these tournaments because they were organized under comparable conditions, in particular, giving players the same amount of thinking time per game, offering comparable prize funds, and implementing strict anti-cheating measures.
We base our performance benchmark on evaluating the moves played by the participants using a currently leading chess engine that substantially outperforms the best human players in terms of playing strength. We use the engine’s evaluation to construct a measure of individual performance that offers a high degree of objectivity and accuracy. Overall, we analyse 214,810 individual moves including 59,273 moves of those 20 players who participated in both the remote online and the traditional offline tournaments. Using a regression model with player fixed effects that allows us to estimate changes in within-player performance, we find the quality of play is substantially worse (at a statistical-significance level of 5%) when the same player competed online versus offline. The adverse effect is particularly pronounced for the first 2 online tournaments, suggesting a partial adaptation to the remote setting in later tournaments.
…We find that playing online leads to a reduction in the quality of moves. The error variable as defined in equation (3) is, on average, 1.7 units larger when playing online than when playing identical moves in an offline setting. This corresponds to a 1.7% increase of the measure (RawError+ 1) or an ~7.5% increase in the RawError…To better assess the size of the effect, we provide a back-of-the-envelope calculation for the change in playing strength when playing online, as expressed in terms of the Elo rating. In our sample, the coefficient on the Elo rating of the player (−0.0001308) is based on a regression without individual fixed effects,18 indicating that if a player’s Elo rating increases by one point, the error variable as defined in equation (3) is reduced by 0.013 units on average. Playing online increases the error variable, on average, by 1.7 units, which corresponds to a loss of 130 points of Elo rating. The factual drop in playing strength, however, is likely to be lower, because our analysis excludes the opening stage of the game, which is less likely to be affected by the online setting. Moreover, our linear regression model might not account for smaller average error margins at the top of the Elo distribution.
Figure 1: Effect Heterogeneity by Online Tournament. Notes: The figure shows the estimated coefficient ̂δ based on equation (2). Dots represent the point estimates, the grey (black) bar show the 95% (90%) confidence intervals based on clustered standard errors at the game level. Regressions contain player and move fixed effects as well as the full set of control variables (see Table 3). The opening phase of each game is excluded for each player (m ≤ 15). MCI—Magnus Carlsen Invitational, LARC—Lindores Abbey Rapid Challenge, OCM—Chessable Masters, LoC—Legends of Chess, SO—Skilling Open.
What is being learned by superhuman neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability.
In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network. We also provide a behavioural analysis focusing on opening play, including qualitative analysis from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary investigation looking at the low-level details of AlphaZero’s representations, and make the resulting behavioural and representational analyses available online.
Medical research suggests that particulate matter (PM) increases stress hormones, therefore increasing the feeling of stress, which has been hypothesised to induce individuals to take less risk.
To examine this, we study whether PM10 increases the probability of drawing in chess games using information from the Dutch club competition.
We provide evidence of a reasonably strong effect: A 10μg increase in PM10 (33.6% of mean concentration) leads to a 5.6% increase in draws. We examine a range of explanations for these findings.
Our preferred interpretation is that air pollution causes individuals to take less risk.
[Keywords: air pollution, particulate matter, decision-making, risk-taking]
TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, most beginners will improve their rating by 100 LichessElo points in 3–6 months. Most “experienced” chess players in the 1,400–1,800 rating range will take 3–4 years to improve their rating by 100 Lichess Elo points. There’s no strong evidence that playing more games makes you improve quicker.
…After extracting the data for Elo per player over time (including games as white and black), filtering for one time control, calculating the monthly average, aligning everyone’s starting dates, assigning the ratings into rating bins, and averaging the ratings by the rating bins (with 95% confidence intervals), I get the plot below:
Elo vs. time: 820,786 users who gained >1,000 Elo
I analyzed the data from the perspective of a player’s monthly average which should be a better estimate of a player’s playing strength than looking at the game-by-game Elo fluctuation. I’m not particularly interested in cases of players who managed to jump 100 points in one afternoon blitz binge session. I believe those instances can be attributed to chance rather than those players suddenly having a “eureka” moment that boosted their playing strength by 100 Elo points overnight.
From the graph, it looks like improvement rate depends a lot on what your current Elo is. As one might expect, lower Elo ratings have the greatest opportunity to improve quickly, while higher Elo ratings will take much longer to see improvement. Most players in the 800–1,000 rating range (about 6% of players) will see their Elo jump up 100 points in just a few months of activity. Most players in the 1,600–2,000 range (27% of players) will take 4 years or more to move up just 100 Elo points…4 years just for 100 Elo points? Seems a bit longer than I expected. But it is plausible.
There are players in the data with long histories of activity who have not improved their rating despite playing many games over the span of many years. See the player who’s played the most games of all time on Lichess:
German11 Elo over time [83,466 games over 7 years, >285 days cumulative time spent playing, Elo rating 1,528, 49.9th percentile]
…The data looks consistent with previous analysis, but I think it better illustrates how only a small percentage of players actually do improve. It looks like only about the top 10% of players achieve meaningful improvement (>100 rating gain) over time, with only about 1% of players breaking past more than 200 Elo in a few years. The majority 90% of players seem to hover around their initial rating despite being active on Lichess for several years.
…Here’s another heatmap showing the average time it took for people to achieve × Elo gain, divided up by their starting Elo…The results were actually pretty surprising. It looks like these “outliers” seem to have made these gains in a little less than 2 years! Amazingly, that’s about the amount of time it took GM Hikaru Nakamura to bridge that gap when he was learning chess as a child. So it seems that there is hope for people looking to become strong players. With serious study and dedication, it looks like it’s possible to make massive improvements in a reasonably short amount of time.
…I thought I would add a few remarks on my very first job as chess teacher, which I did at ages 14–15.
Chess teaching isn’t mainly about chess. A chess teacher has to have a certain mystique above all, while at the same time being approachable. Even at 14 this is possible. Your students are hiring you at least as much for your mystique as for the content of your lessons.
Not everyone…wanted to be a better chess player. For some, taking the lesson was a substitute for hard work on chess, not a complement to it. The lesson for them was a fun social experience, and it kept the game of chess salient in their minds. They became “the kind of person who takes chess lessons.” I understood this well at the time. Some of the students wanted to show you their chess games, so that someone else would be sharing in their triumphs and tragedies. That is an OK enough way to proceed with a chess lesson, but often the students were more interested in “showing” than in listening and learning and hearing the hard truths about their play.
Students are too interested in asking your opinion of particular openings. At lower-tier amateur levels of chess, the opening just doesn’t matter that much, provided you don’t get into an untenable position too quickly. Nonetheless openings are a fun thing to learn about, and discussing openings can give people the illusion of learning something important…
What I really had to teach was methods for hard work to improve your game consistently over time. That might include for instance annotating a game or position “blind”, and then comparing your work to the published analysis of a world-class player, a la Alexander Kotov’sThink Like a Grandmaster. [The book is not concerned with advising where pieces should be placed on the board, or tactical motifs, but rather with the method of thinking that should be employed during a game. Kotov’s advice to identify candidate moves and methodically examine them to build up an “analysis tree” remains well known today.] I did try to teach that, but the demand for this service was not always so high.
The younger chess prodigy I taught was quite bright and also likable. But he had no real interest in improving his chess game. Instead, hanging out with me was more fun for him than either doing homework or watching TV, and I suspect his parents understood that. In any case, early on I was thinking keenly about talent and the determinants of ultimate success, and obsessiveness seemed quite important. All of the really good chess players had it, and without it you couldn’t get far above expert level.
Even when machine learning systems surpass human ability in a domain, there are many reasons why AI systems that capture human-like behavior would be desirable: humans may want to learn from them, they may need to collaborate with them, or they may expect them to serve as partners in an extended interaction. Motivated by this goal of human-like AI systems, the problem of predicting human actions—as opposed to predicting optimal actions—has become an increasingly useful task.
We extend this line of work by developing highly accurate personalized models of human behavior in the context of chess. Chess is a rich domain for exploring these questions, since it combines a set of appealing features: AI systems have achieved superhuman performance but still interact closely with human chess players both as opponents and preparation tools, and there is an enormous amount of recorded data on individual players. Starting with an open-source version of AlphaZero trained on a population of human players, we demonstrate that we can significantly improve prediction of a particular player’s moves by applying a series of fine-tuning adjustments. Furthermore, we can accurately perform stylometry—predicting who made a given set of actions—indicating that our personalized models capture human decision-making at an individual level.
As artificial intelligence becomes increasingly intelligent—in some cases, achieving superhuman performance—there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn from. A crucial step in bridging this gap between human and artificial intelligence is modeling the granular actions that constitute human behavior, rather than simply matching aggregate human performance.
We pursue this goal in a model system with a long history in artificial intelligence: chess. The aggregate performance of a chess player unfolds as they make decisions over the course of a game. The hundreds of millions of games played online by players at every skill level form a rich source of data in which these decisions, and their exact context, are recorded in minute detail. Applying existing chess engines to this data, including an open-source implementation of AlphaZero, we find that they do not predict human moves well.
We develop and introduce Maia, a customized version of Alpha-Zero trained on human chess games, that predicts human moves at a much higher accuracy than existing engines, and can achieve maximum accuracy when predicting decisions made by players at a specific skill level in a tuneable way. For a dual task of predicting whether a human will make a large mistake on the next move, we develop a deep neural network that significantly outperforms competitive baselines. Taken together, our results suggest that there is substantial promise in designing artificial intelligence systems with human collaboration in mind by first accurately modeling granular human decision-making.
The relative importance of different factors in the development of human skills has been extensively discussed. Research on expertise indicates that focused practice may be the sole determinant of skill, while intelligence researchers underline the relative importance of abilities at even the highest level of skill. There is indeed a large body of research that acknowledges the role of both factors in skill development and retention. It is, however, unknown how intelligence and practice come together to enable the acquisition and retention of complex skills across the life span. Instead of focusing on the 2 factors, intelligence and practice, in isolation, here we look at their interplay throughout development. In a longitudinal study that tracked chess players throughout their careers, we show that both intelligence and practice positively affect the acquisition and retention of chess skill. Importantly, the nonlinear interaction between the 2 factors revealed that more intelligent individuals benefited more from practice. With the same amount of practice, they acquired chess skill more quickly than less intelligent players, reached a higher peak performance, and arrested decline in older age. Our research demonstrates the futility of scrutinizing the relative importance of highly intertwined factors in human development.
Gf, Gc, Gs-m, and Gs all correlated positively and statistically-significantly with chess skill.
The relationship between Gf and chess skill was moderated by age and skill level.
Chess skill correlated positively with numerical, visuospatial, and verbal ability.
Why are some people more skilled in complex domains than other people?
Here, we conducted a meta-analysis to evaluate the relationship between cognitive ability and skill in chess.
Chess skill correlated positively and statistically-significantly with fluid reasoning (Gf) (r̄ = 0.24), comprehension-knowledge (Gc) (r̄= 0.22), short-term memory (Gs-m) (r̄= 0.25), and processing speed (Gs) (r̄= 0.24); the meta-analytic average of the correlations was (r̄= 0.24).
Moreover, the correlation between Gf and chess skill was moderated by age (r̄= 0.32 for youth samples vs. r̄ = 0.11 for adult samples), and skill level (r̄= 0.32 for unranked samples vs. r̄ = 0.14 for ranked samples). Interestingly, chess skill correlated more strongly with numerical ability (r̄= 0.35) than with verbal ability (r̄= 0.19) or visuospatial ability (r̄ = 0.13).
The results suggest that cognitive ability contributes meaningfully to individual differences in chess skill, particularly in young chess players and/or at lower levels of skill.
[See also “Exact, Exacting: Who is the Most Accurate World Champion?”] An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has begun to consider whether we can characterize and predict the kinds of decisions where people are likely to make errors.
To investigate what a general framework for human error prediction might look like, we focus on a model system with a rich history in the behavioral sciences: the decisions made by chess players as they select moves in a game. We carry out our analysis at a large scale, employing datasets with several million recorded games, and using chess tablebases to acquire a form of ground truth for a subset of chess positions that have been completely solved by computers but remain challenging even for the best players in the world.
We organize our analysis around 3 categories of features that we argue are present in most settings where the analysis of human error is applicable: the skill of the decision-maker, the time available to make the decision, and the inherent difficulty of the decision. We identify rich structure in all three of these categories of features, and find strong evidence that in our domain, features describing the inherent difficulty of an instance are statistically-significantly more powerful than features based on skill or time.
[Discussion of the creation of modern sports training: professional athletes, even NBA stars, typically did not ‘train’. Practice was about getting into shape and working with teammates, if even that much—one simply took one’s skills for granted. Coaches focused on strategy, not coaching.
A harbinger of the professionalization of professional athletes was basketball player Kermit Washington, on the verge of washing out of the NBA early on until he swallowed his pride and began tutoring with coach Pete Newell, who drilled Kermit on the basics repeatedly. Kermit eventually became an All-Star player and influenced other NBA players to engage in coaching and deliberate practice to improve their fundamentals. The modern paradigm is a ruthless quest for perfection in every dimension, quantified, and applying the latest science and technology to eek out even the slightest fraction of a second improvement; athletes are projects, with many different specialists examining them constantly for potential improvements, and as importantly, when not to practice lest they be injured.
And the results speak for themselves—performance has never been higher, the impossible is now done routinely by many professionals, this continuous improvement trend has spread to other domains too, including chess, classical music, business. Equally striking are domains which don’t see trends like this, particular American education.]
“You need to have the best PhDs onboard as well”, McClusky says. This technological and analytical arms race is producing the best athletes in history.
The arms race centers on an obsessive scrutiny of every aspect of training and performance. Trainers today emphasize sports-specific training over generalized conditioning: if you’re a baseball player, you work on rotational power; if you’re a sprinter, on straight-line explosive power. All sorts of tools have been developed to improve vision, reaction time, and the like. The Dynavision D2 machine is a large board filled with flashing lights, which ballplayers have to slap while reading letters and math equations that the board displays. Football players use Nike’s Vapor Strobe goggles, which periodically cloud for tenth-of-a-second intervals, in order to train their eyes to focus even in the middle of chaos. Training is also increasingly personalized. Players are working not just with their own individual conditioning coaches but also with their own individual skills coaches. In non-team sports, such as tennis and golf, coaches were rare until the seventies. Today, tennis players such as Novak Djokovic have not just a single coach but an entire entourage. In team sports, meanwhile, there’s been a proliferation of gurus. George Whitfield has built a career as a “quarterback whisperer”, turning college quarterbacks into NFL-ready prospects. Ron Wolforth, a pitching coach, is known for resurrecting pitchers’ careers—he recently transformed the Oakland A’s Scott Kazmir from a has-been into an All-Star by revamping his mechanics and motion. Then there’s the increasing use of biometric sensors, equipped with heart-rate monitors, G.P.S., and gyroscopes, to measure not just performance (how fast a player is accelerating or cutting) but also fatigue levels. And since many studies show that getting more sleep leads to better performance, teams are now worrying about that, too. The N.B.A.’s Dallas Mavericks have equipped players with Readiband monitors to measure how much, and how well, they’re sleeping.
All this effort may sound a bit nuts. But it’s how you end up with someone like Chris Hoy, the British cyclist who won two gold medals at the London Olympics in 2012, trailed by a team of scientists, nutritionists, and engineers. Hoy ate a carefully designed diet of five thousand calories a day. His daily workouts—two hours of lifting in the morning, three hours in the velodrome in the afternoon, and an easy one-hour recovery ride in the evening—had been crafted to maximize both his explosive power and his endurance. He had practiced in wind tunnels at the University of Southampton. He had worn biofeedback sensors that delivered exact data to his trainers about how his body was responding to practice. The eighty-thousand-dollar carbon-fibre bike he rode helped, too. Hoy was the ultimate product of an elaborate and finely tuned system designed to create the best cyclist possible. And—since his competitors weren’t slacking, either—he still won by only a fraction of a second.
Ericsson and colleagues argue that deliberate practice explains expert performance.
We tested this view in the two most studied domains in expertise research.
Deliberate practice is not sufficient to explain expert performance.
Other factors must be considered to advance the science of expertise.
Twenty years ago, Ericsson et al 1993 proposed that expert performance reflects a long period of deliberate practice rather than innate ability, or “talent”. Ericsson et al 1993 found that elite musicians had accumulated thousands of hours more deliberate practice than less accomplished musicians, and concluded that their theoretical framework could provide “a sufficient account of the major facts about the nature and scarcity of exceptional performance” (p. 392). The deliberate practice view has since gained popularity as a theoretical account of expert performance, but here we show that deliberate practice is not sufficient to explain individual differences in performance in the two most widely studied domains in expertise research—chess and music. For researchers interested in advancing the science of expert performance, the task now is to develop and rigorously test theories that take into account as many potentially relevant explanatory constructs as possible.
This paper presents a replication and extension of Chi′s (1978) classic study on chess expertise [“Knowledge structures and memory development”]. A major outcome of Chi′s research was that although adult novices had a better memory span than child experts, the children showed better memory for chess positions than the adults.
The major goal of this study was to explore the effects of the following task characteristics on memory performance: (1) Familiarity with the constellation of chess pieces (ie. meaningful versus random positions) and (2) familiarity with both the geometrical structure of the board and the form and color of chess pieces.
The tasks presented to the four groups of subjects (ie. child experts and novices, adult experts and novices) included memory for meaningful and random chess positions as well as memory for the location of wooden pieces of different forms on a board geometrically structured by circles, triangles, rhombuses, etc. (control task 1). Further, a digit span memory task was given (control task 2). The major assumption was that the superiority of experts should be greatest for the meaningful chess positions, somewhat reduced but still statistically-significant for the random positions, and nonsignificant for the board control task.
Only age effects were expected for the digit span task. The results conformed to this pattern, showing that each type of knowledge contributed to the experts′ superior memory span for chess positions.
One of the most extraordinary books ever written about chess and chessplayers, this authoritative study goes well beyond a lucid explanation of how today’s chessmasters and tournament players are rated. Twenty years’ research and practice produce a wealth of thought-provoking and hitherto unpublished material on the nature and development of high-level talent:
Just what constitutes an “exceptional performance” at the chessboard? Can you really profit from chess lessons? What is the lifetime pattern of Grandmaster development? Where are the masters born? Does your child have master potential?
The step-by-step rating system exposition should enable any reader to become an expert on it. For some it may suggest fresh approaches to performance measurement and handicapping in bowling, bridge, golf and elsewhere. 43 charts, diagrams and maps supplement the text.
How and why are chessmasters statistically remarkable? How much will your rating rise if you work with the devotion of a Steinitz? At what age should study begin? What toll does age take, and when does it begin?
Development of the performance data, covering hundreds of years and thousands of players, has revealed a fresh and exciting version of chess history. One of the many tables identifies 500 all-time chess greats, with personal data and top lifetime performance ratings.
Just what does government assistance do for chess? What is the Soviet secret? What can we learn from the Icelanders? Why did the small city of Plovdiv produce three Grandmasters in only ten years? Who are the untitled dead? Did Euwe take the championship from Alekhine on a fluke? How would Fischer fare against Morphy in a ten-wins match?
“It was inevitable that this fascinating story be written”, asserts FIDE President Max Euwe, who introduces the book and recognizes the major part played by ratings in today’s burgeoning international activity. Although this is the definitive ratings work, with statistics alone sufficient to place it in every reference library, it was written by a gentle scientist for pleasurable reading—for the enjoyment of the truths, the questions, and the opportunities it reveals.