December 2017 gwern.net newsletter with links on genetics/AI/politics/psychology/biology/economics, 2 movie reviews and 1 anime review newsletter 2017-11-30–2021-01-04finishedcertainty: logimportance: 0
“Common risk variants identified in autism spectrum disorder”, Grove et al 2017 (ASD, as expected, crosses the sample size threshold for hits. More importantly: the genetic correlations imply that autism is not a single thing & that intelligence genes are not inherently ‘autistic’; hence it will be possible to select for intelligence without selecting for autism. I’ve always had a hard time believing the simple genetic correlations between intelligence and ‘autism spectrum disorder’ since it’s unclear why variants that improve intelligence would decrease social functioning which should require intelligence as much as anything else; if it’s simply that there’s heterogeneity and that, say, ‘systematizing’ is shared between intelligence and autism diagnoses, that would be a credible resolution.)
“Postmortem: Every Frame a Painting” (Copyright is why we can’t have nice things: they spent almost as much time dealing with YouTube’s copyright enforcement as editing, it sounds like, and had to give up entirely on major topics like Andrei Tarkovsky because they didn’t think they could trick the IP enforcers into permitting a legitimate exercise of fair use.)
“The Social Origins of Inventors”, Aghion et al 2017 (Since they include no correction for measurement error and the IQ score is quite bad (decimalized!) and patents aren’t so great either, some of their interpretations, particularly the last one, are highly questionable, but the takeaway is that even crude measures of intelligence and innovation show a strong predictive relationship, as expected.)
“IQ decline and Piaget: Does the rot start at the top?”, James Flynn & Shayer 2017 (James Flynn (of the Flynn effect) endorses dysgenics and explores where the impacts are: in the most intelligent childrens’ ability to abstract and experiment. This is further evidence that the Flynn effect is hollow as by thin tails, a half SD or more average increase should produce an enormous multiplicative increase in the number of children at the highest stage of Piagetian development, rather than an equally enormous fall.)
“Computer latency: 1977–2017” (Why are modern computers so much less pleasant than a 1980s terminal in some ways? One way is that they are actually slower in a crucial way: input/display latency. Luu measures terminal scrolling in a wide variety of devices and scroll latency in mobile browsers and finds decades-old devices are often the best. Even keyboards turn out to vary by a factor of 5 in simple keypress latency!)
Prison School (inside all the juvenile raunch, the BDSM and nipple-hair jokes, in the broad manzai-style comedy familiar from anime comedies like Cromartie High, beats a full-metal shonen heart of gold as the protagonists become nakama and learn the value of manly friendship. Combined with the fairly intricate plotting, I enjoyed PS more than I expected.)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
“Common risk variants identified in autism spectrum disorder”, Jakob Grove, Stephan Ripke, Thomas D. Als, Manuel Mattheisen, Raymond Walters, Hyejung Won, Jonatan Pallesen, Esben Agerbo, Ole A. Andreassen, Richard Anney, Rich Belliveau, Francesco Bettella, Joseph D. Buxbaum, Jonas Bybjerg-Grauholm, Marie Bækved-Hansen, Felecia Cerrato, Kimberly Chambert, Jane H. Christensen, Claire Churchhouse, Karin Dellenvall, Ditte Demontis, Silvia De Rubeis, Bernie Devlin, Srdjan Djurovic, Ashle Dumont, Jacqueline Goldstein, Christine S. Hansen, Mads Engel Hauberg, Mads V. Hollegaard, Sigrun Hope, Daniel P. Howrigan, Hailiang Huang, Christina Hultman, Lambertus Klei, Julian Maller, Joanna Martin, Alicia R. Martin, Jennifer Moran, Mette Nyegaard, Terje Nærland, Duncan S. Palmer, Aarno Palotie, Carsten B. Pedersen, Marianne G. Pedersen, Timothy Poterba, Jesper B. Poulsen, Beate St Pourcain, Per Qvist, Karola Rehnström, Avi Reichenberg, Jennifer Reichert, Elise B. Robinson, Kathryn Roeder, Panos Roussos, Evald Saemundsen, Sven Sandin, F. Kyle Satterstrom, George D. Smith, Hreinn Stefansson, Kari Stefansson, Stacy Steinberg, Christine Stevens, Patrick F. Sullivan, Patrick Turley, G. Bragi Walters, Xinyi Xu, Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium, BUPGEN, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, 23andMe Research Team, Daniel Geschwind, Merete Nordentoft, David M. Hougaard, Thomas Werge, Ole Mors, Preben Bo Mortensen, Benjamin M. Neale, Mark J. Daly, Anders D. Børglum (2017-11-25):
Autism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 ASD cases and 27,969 controls that identifies five genome-wide significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), seven additional loci shared with other traits are identified at equally strict significance levels. Dissecting the polygenic architecture we find both quantitative and qualitative polygenic heterogeneity across ASD subtypes, in contrast to what is typically seen in other complex disorders. These results highlight biological insights, particularly relating to neuronal function and corticogenesis and establish that GWAS performed at scale will be much more productive in the near term in ASD, just as it has been in a broad range of important psychiatric and diverse medical phenotypes.
Important traits in agricultural, natural, and human populations are increasingly being shown to be under the control of many genes that individually contribute only a small proportion of genetic variation. However, the majority of modern tools in quantitative and population genetics, including genome wide association studies and selection mapping protocols, are designed to target the identification of individual genes with large effects. We have developed an approach to identify traits that have been under selection and are controlled by large numbers of loci. In contrast to existing methods, our technique utilizes additive effects estimates from all available markers, and relates these estimates to allele frequency change over time. Using this information, we generate a composite statistic, denoted Ĝ, which can be used to test for significant evidence of selection on a trait. Our test requires genotypic data from multiple time points but only a single time point with phenotypic information. Simulations demonstrate that Ĝ is powerful for identifying selection, particularly in situations where the trait being tested is controlled by many genes, which is precisely the scenario where classical approaches for selection mapping are least powerful. We apply this test to breeding populations of maize and chickens, where we demonstrate the successful identification of selection on traits that are documented to have been under selection.
The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.
Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the non-saturating GAN introduced in .
Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.
Slide deck for Google Brain presentation on Machine Learning and the future of ML development processes. Conclusions: ML hardware is at its infancy. Even faster systems and wider deployment will lead to many more breakthroughs across a wide range of domains. Learning in the core of all of our computer systems will make them better/more adaptive. There are many opportunities for this.
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70 several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible.
Autonomous AI systems (Agent AIs) trained using reinforcement learning can do harm when they take wrong actions, especially superintelligent Agent AIs. One solution would be to eliminate their agency by not giving AIs the ability to take actions, confining them to purely informational or inferential tasks such as classification or prediction (Tool AIs), and have all actions be approved & executed by humans, giving equivalently superintelligent results without the risk.
I argue that this is not an effective solution for two major reasons. First, because Agent AIs will by definition be better at actions than Tool AIs, giving an economic advantage. Secondly, because Agent AIs will be better at inference & learning than Tool AIs, and this is inherently due to their greater agency: the same algorithms which learn how to perform actions can be used to select important datapoints to learn inference over, how long to learn, how to more efficiently execute inference, how to design themselves, how to optimize hyperparameters, how to make use of external resources such as long-term memories or external software or large databases or the Internet, and how best to acquire new data. All of these actions will result in Agent AIs more intelligent than Tool AIs, in addition to their greater economic competitiveness. Thus, Tool AIs will be inferior to Agent AIs in both actions and intelligence, implying use of Tool AIs is a even more highly unstable equilibrium than previously argued, as users of Agent AIs will be able to outcompete them on two dimensions (and not just one).
We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048×1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.
The Global Burden of Disease Study (GBD) is a comprehensive regional and global research program of disease burden that assesses mortality and disability from major diseases, injuries, and risk factors. GBD is a collaboration of over 3600 researchers from 145 countries. Under principal investigator Christopher J.L. Murray, GBD is based out of the Institute for Health Metrics and Evaluation (IHME) at the University of Washington and funded by the Bill and Melinda Gates Foundation.
Piaget's theory of cognitive development is a comprehensive theory about the nature and development of human intelligence. It was originated by the Swiss developmental psychologist Jean Piaget (1896–1980). The theory deals with the nature of knowledge itself and how humans gradually come to acquire, construct, and use it. Piaget's theory is mainly known as a developmental stage theory. Piaget "was intrigued by the fact that children of different ages made different kinds of mistakes while solving problems". He also believed that children are not like "little adults" who may know less; children just think and speak differently. By thinking that children have great cognitive abilities, Piaget came up with four different cognitive development stages, which he put out into testing. Within those four stages he managed to group them with different ages. Each stage he realized how children managed to develop their cognitive skills. For example, he believed that children experience the world through actions, representing things with words, thinking logically, and using reasoning.
I’ve had this nagging feeling that the computers I use today feel slower than the computers I used as a kid. As a rule, I don’t trust this kind of feeling because human perception has been shown to be unreliable in empirical studies, so I carried around a high-speed camera and measured the response latency of devices I’ve run into in the past few months. These are tests of the latency between a keypress and the display of a character in a terminal (see appendix for more details)…If we look at overall results, the fastest machines are ancient. Newer machines are all over the place. Fancy gaming rigs with unusually high refresh-rate displays are almost competitive with machines from the late 70s and early 80s, but “normal” modern computers can’t compete with thirty to forty year old machines.
…Almost every computer and mobile device that people buy today is slower than common models of computers from the 70s and 80s. Low-latency gaming desktops and the iPad Pro can get into the same range as quick machines from thirty to forty years ago, but most off-the-shelf devices aren’t even close.
If we had to pick one root cause of latency bloat, we might say that it’s because of “complexity”. Of course, we all know that complexity is bad. If you’ve been to a non-academic non-enterprise tech conference in the past decade, there’s a good chance that there was at least one talk on how complexity is the root of all evil and we should aspire to reduce complexity.
Unfortunately, it’s a lot harder to remove complexity than to give a talk saying that we should remove complexity. A lot of the complexity buys us something, either directly or indirectly. When we looked at the input of a fancy modern keyboard vs. the Apple 2 keyboard, we saw that using a relatively powerful and expensive general purpose processor to handle keyboard inputs can be slower than dedicated logic for the keyboard, which would both be simpler and cheaper. However, using the processor gives people the ability to easily customize the keyboard, and also pushes the problem of “programming” the keyboard from hardware into software, which reduces the cost of making the keyboard. The more expensive chip increases the manufacturing cost, but considering how much of the cost of these small-batch artisanal keyboards is the design cost, it seems like a net win to trade manufacturing cost for ease of programming.
He measures 21 keyboard latencies using a logic analyzer, finding a range of 15–60ms (!), representing a waste of a large fraction of the available ~100–200ms latency budget before a user notices and is irritated (“the median keyboard today adds as much latency as the entire end-to-end pipeline as a fast machine from the 70s.”). The latency estimates are surprising, and do not correlate with advertised traits. They simply have to be measured empirically.]
We can see that, even with the limited set of keyboards tested, there can be as much as a 45ms difference in latency between keyboards. Moreover, a modern computer with one of the slower keyboards attached can’t possibly be as responsive as a quick machine from the 70s or 80s because the keyboard alone is slower than the entire response pipeline of some older computers. That establishes the fact that modern keyboards contribute to the latency bloat we’ve seen over the past forty years…Most keyboards add enough latency to make the user experience noticeably worse, and keyboards that advertise speed aren’t necessarily faster. The two gaming keyboards we measured weren’t faster than non-gaming keyboards, and the fastest keyboard measured was a minimalist keyboard from Apple that’s marketed more on design than speed.
Icarus is a 2017 American documentary film by Bryan Fogel, which chronicles Fogel's exploration of the option of doping to win an amateur cycling race and happening upon a major international doping scandal when he asks for the help of Grigory Rodchenkov, the head of the Russian anti-doping laboratory. It premiered at Sundance Film Festival on January 20, 2017, and was awarded the U.S. Documentary Special Jury Award. Netflix acquired the distribution rights and released Icarus globally on August 4, 2017. At the 90th Academy Awards, the film won the Academy Award for Best Documentary Feature.
Rollerball is a 1975 science fiction sports film directed and produced by Norman Jewison. It stars James Caan, John Houseman, Maud Adams, John Beck, Moses Gunn and Ralph Richardson. The screenplay, written by William Harrison, adapted his own short story, "Roller Ball Murder", which had first appeared in the September 1973 issue of Esquire.
Prison School is a Japanese manga series written and illustrated by Akira Hiramoto. It was serialized in Kodansha's Weekly Young Magazine from February 2011 to December 2017. Yen Press has licensed the manga in North America. A 12-episode anime adaptation directed by Tsutomu Mizushima aired between July and September 2015, while a live-action drama television series aired from October to December 2015.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.
These graphs show the distribution of latencies for each terminal. The y-axis has the latency in milliseconds. The x-axis is the percentile (e.g., 50 means represents 50%-ile keypress i.e., the median keypress). Measurements are with macOS unless otherwise stated. The graph on the left is when the machine is idle, and the graph on the right is under load. If we just look at median latencies, some setups don’t look too bad—terminal.app and emacs-eshell are at roughly 5ms unloaded, small enough that many people wouldn’t notice. But most terminals (st, alacritty, hyper, and iterm2) are in the range where you might expect people to notice the additional latency even when the machine is idle. If we look at the tail when the machine is idle, say the 99.9%-ile latency, every terminal gets into the range where the additional latency ought to be perceptible, according to studies on user interaction. For reference, the internally generated keypress to GPU memory trip for some terminals is slower than the time it takes to send a packet from Boston to Seattle and back, about 70ms.
…Most terminals have enough latency that the user experience could be improved if the terminals concentrated more on latency and less on other features or other aspects of performance. However, when I search for terminal benchmarks, I find that terminal authors, if they benchmark anything, benchmark the speed of sinking stdout or memory usage at startup. This is unfortunate because most “low performance” terminals can already sink stdout many orders of magnitude faster than humans can keep up with, so further optimizing stdout throughput has a relatively small impact on actual user experience for most users. Likewise for reducing memory usage when an idle terminal uses 0.01% of the memory on my old and now quite low-end laptop. If you work on a terminal, perhaps consider relatively more latency and interactivity (e.g., responsiveness to ^C) optimization and relatively less throughput and idle memory usage optimization.
A couple years ago, I took a road trip from Wisconsin to Washington and mostly stayed in rural hotels on the way. I expected the internet in rural areas too sparse to have cable internet to be slow, but I was still surprised that a large fraction of the web was inaccessible. Some blogs with lightweight styling were readable, as were pages by academics who hadn’t updated the styling on their website since 1995. But very few commercial websites were usable (other than Google). When I measured my connection, I found that the bandwidth was roughly comparable to what I got with a 56k modem in the 90s. The latency and packet loss were significantly worse than the average day on dialup: latency varied between 500ms and 1000ms and packet loss varied between 1% and 10%. Those numbers are comparable to what I’d see on dialup on a bad day.
Despite my connection being only a bit worse than it was in the 90s, the vast majority of the web wouldn’t load…When Microsoft looked at actual measured connection speeds, they found that half of Americans don’t have broadband speed. Heck, AOL had 2 million dial-up subscribers in 2015, just AOL alone. Outside of the U.S., there are even more people with slow connections. I recently chatted with Ben Kuhn, who spends a fair amount of time in Africa, about his internet connection:
I’ve seen ping latencies as bad as ~45 sec and packet loss as bad as 50% on a mobile hotspot in the evenings from Jijiga, Ethiopia. (I’m here now and currently I have 150ms ping with no packet loss but it’s 10am). There are some periods of the day where it ~never gets better than 10 sec and ~10% loss. The internet has gotten a lot better in the past ~year; it used to be that bad all the time except in the early mornings.
…Let’s load some websites that programmers might frequent with a variety of simulated connections to get data on page load times…The timeout for tests was 6 minutes; anything slower than that is listed as FAIL. Pages that failed to load are also listed as FAIL. A few things that jump out from the table are:
A large fraction of the web is unusable on a bad connection. Even on a good (0% packet loss, no ping spike) dialup connection, some sites won’t load…If you were to look at the 90%-ile results, you’d see that most pages fail to load on dialup and the “Bad” and “😱” connections are hopeless for almost all sites.
Some sites will use a lot of data!
…The flaw in the “page weight doesn’t matter because average speed is fast” [claim] is that if you average the connection of someone in my apartment building (which is wired for 1Gbps internet) and someone on 56k dialup, you get an average speed of 500 Mbps. That doesn’t mean the person on dialup is actually going to be able to load a 5MB website. The average speed of 3.9 Mbps comes from a 2014 Akamai report, but it’s just an average. If you look at Akamai’s 2016 report, you can find entire countries where more than 90% of IP addresses are slower than that!..“Use bcrypt” has become the mantra for a reasonable default if you’re not sure what to do when storing passwords. The web would be a nicer place if “use webpagetest” caught on in the same way. It’s not always the best tool for the job, but it sure beats the current defaults.
In this article I examine human and machine aspects of typing latency (“typing lag”) and present experimental data on latency of popular text / code editors. The article is inspired by my work on implementing “zero-latency typing” in IntelliJ IDEA.
1.1. Feedback 1.2. Motor skill 1.3. Internal model 1.4. Multisensory integration 1.5. Effects
3.1. Configuration 3.2. Methodology 3.3. Windows 3.4. Linux 3.5. VirtualBox
…To measure processing delays experimentally I created Typometer—a tool to determine and analyze visual latency of text editors (sources). Typometer works by generating OS input events and using screen capture to measure the delay between a keystroke and a corresponding screen update. Hence, the measurement encompasses all the constituents of processing latency (i. e. OS queue, VM, editor, GPU pipeline, buffering, window manager and possible V-Sync). That is the right thing to do, because all those components are inherently intertwined with the editor, and in principle, editor application has influence on all the parts.…[He tested 9] Editors: Atom 1.1 / Eclipse 4.5.1 / Emacs 24.5.1 / Gedit 3.10.4 / GVim 7.4.712 / IntelliJ Idea CE 15.0 / Netbeans 8.1 / Notepad++ 6.8.4 / Sublime Text 3083.
Apparently, editors are not created equal (at least, from the standpoint of latency).