February 2018 gwern.net newsletter with 3 new essays and links on genetics/AI/psychology/economics, 1 book review, 1 movie review, and 7 pieces of music. newsletter 2018-01-28–2021-01-09finishedcertainty: logimportance: 0
“DeepGS: Predicting phenotypes from genotypes using Deep Learning”, Ma et al 2017 (rrBLUP is one of the best generic methods if you’re working with relatively small n/pGWAS data, and this is not a setup or design that favors CNNs: single trait w/no auxiliary data, tiny n/p, shallow CNN, no hyperparameter or architecture optimization or data augmentation—but the crippled CNN still beats rr-BLUP. What would happen if one did a full-strength CNNGWAS across the UKBB? The memory and computation requirements would be exorbitant, but one can dream.)
“2017 was the year consumer DNA testing blew up” (“More people took genetic ancestry tests last year than in all previous years, combined. The number of people who have had their DNA analyzed with direct-to-consumer genetic genealogy tests more than doubled during 2017 and now exceeds 12 million, according to industry estimates.”)
The Alinea Project (I’m awed by the depth of this guy’s foray into doing “modernist cuisine” at home and replicating every single Alinea recipe published in Alinea, with impeccable photographic documentation.)
“The Robots Are Coming for Garment Workers” (Automation comes to the last few steps in the clothes-making process. Reminds me of Hanson’s model of how automation/tech/capital initially serves as a complement to labor, making labor ever better off, but then at a sufficient point of development, flips over to being a substitute for labor.)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
Genetic selection & engineering technologies, if banned or highly regulated, could exacerbate existing social inequality by increasing genetic differences between groups on key traits like intelligence or Conscientiousness or ethnocentrism and ensuring near-permanent continuity of wealth or power. Whether this is a serious problem quantitatively with feasible levels of embryo selection has not been much examined. I consider the specific scenario of a single family, such as a royal family or wealthy corporate owner, which wishes to increase the odds of succession to a sufficiently-competent heir who can maintain the dynasty. I suggest a toy model treating it as a repeated liability-threshold model in which heirs are selected as order statistics and if any heir is above a threshold, the dynasty survives another generation; given average numbers of generations and heirs, this defines a unique threshold of competence. Adding embryo selection turns this into a two-stage selection process. In some scenarios, assuming a threshold of ~+1SD and advanced polygenic scores for multiple selection, embryo selection could considerably increase the lifespan of a dynasty due to tail effects on the increased mean in each stage.
“Genome-wide study identifies 611 loci associated with risk tolerance and risky behaviors”, Richard Karlsson Linnér, Pietro Biroli, Edward Kong, S. Fleur W. Meddens, Robbee Wedow, Mark Alan Fontana, Maël Lebreton, Abdel Abdellaoui, Anke R. Hammerschlag, Michel G. Nivard, Aysu Okbay, Cornelius A. Rietveld, Pascal N. Timshel, Stephen P. Tino, Maciej Trzaskowski, Ronald de Vlaming, Christian L. Zünd, Yanchun Bao, Laura Buzdugan, Ann H. Caplin, Chia-Yen Chen, Peter Eibich, Pierre Fontanillas, Juan R. Gonzalez, Peter K. Joshi, Ville Karhunen, Aaron Kleinman, Remy Z. Levin, Christina M. Lill, Gerardus A. Meddens, Gerard Muntané, Sandra Sanchez-Roige, Frank J. van Rooij, Erdogan Taskesen, Yang Wu, Futao Zhang, 23andMe Research Team, eQTLgen Consortium, International Cannabis Consortium, Psychiatric Genomics Consortium, Social Science Genetic Association Consortium, Adam Auton, Jason D. Boardman, David W. Clark, Andrew Conlin, Conor C. Dolan, Urs Fischbacher, Patrick JF Groenen, Kathleen Mullan Harris, Gregor Hasler, Albert Hofman, Mohammad A. Ikram, Sonia Jain, Robert Karlsson, Ronald C. Kessler, Maarten Kooyman, James MacKillop, Minna Männikkö, Carlos Morcillo-Suarez, Matthew B. McQueen, Klaus M. Schmidt, Melissa C. Smart, Matthias Sutter, A. Roy Thurik, Andre G. Uitterlinden, Jon White, Harriet de Wit, Jian Yang, Lars Bertram, Dorret Boomsma, Tõnu Esko, Ernst Fehr, David A. Hinds, Magnus Johannesson, Meena Kumari, David Laibson, Patrik KE Magnusson, Michelle N. Meyer, Arcadi Navarro, Abraham A. Palmer, Tune H. Pers, Danielle Posthuma, Daniel Schunk, Murray B. Stein, Rauli Svento, Henning Tiemeier, Paul RHJ Timmers, Patrick Turley, Robert J. Ursano, Gert G. Wagner, James F. Wilson, Jacob Gratten, James J. Lee, David Cesarini, Daniel J. Benjamin, Philipp D. Koellinger, Jonathan P. Beauchamp (2018-02-08):
Humans vary substantially in their willingness to take risks. In a combined sample of over one million individuals, we conducted genome-wide association studies (GWAS) of general risk tolerance, adventurousness, and risky behaviors in the driving, drinking, smoking, and sexual domains. We identified 611 approximately independent genetic loci associated with at least one of our phenotypes, including 124 with general risk tolerance. We report evidence of substantial shared genetic influences across general risk tolerance and risky behaviors: 72 of the 124 general risk tolerance loci contain a lead SNP for at least one of our other GWAS, and general risk tolerance is moderately to strongly genetically correlated () with a range of risky behaviors. Bioinformatics analyses imply that genes near general-risk-tolerance-associated SNPs are highly expressed in brain tissues and point to a role for glutamatergic and GABAergic neurotransmission. We find no evidence of enrichment for genes previously hypothesized to relate to risk tolerance.
Genomic selection (GS) is a new breeding strategy by which the phenotypes of quantitative traits are usually predicted based on genome-wide markers of genotypes using conventional statistical models. However, the GS prediction models typically make strong assumptions and perform linear regression analysis, limiting their accuracies since they do not capture the complex, non-linear relationships within genotypes, and between genotypes and phenotypes.
Results: We present a deep learning method, named DeepGS, to predict phenotypes from genotypes. Using a deep convolutional neural network, DeepGS uses hidden variables that jointly represent features in genotypic markers when making predictions; it also employs convolution, sampling and dropout strategies to reduce the complexity of high-dimensional marker data. We used a large GS dataset to train DeepGS and compare its performance with other methods. In terms of mean normalized discounted cumulative gain value, DeepGS achieves an increase of 27.70%~246.34% over a conventional neural network in selecting top-ranked 1% individuals with high phenotypic values for the eight tested traits. Additionally, compared with the widely used method RR-BLUP, DeepGS still yields a relative improvement ranging from 1.44% to 65.24%. Through extensive simulation experiments, we also demonstrated the effectiveness and robustness of DeepGS for the absent of outlier individuals and subsets of genotypic markers. Finally, we illustrated the complementarity of DeepGS and RR-BLUP with an ensemble learning approach for further improving prediction performance.
DeepGS is provided as an open source R package available at https://github.com/cma2015/DeepGS.
“Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection”, Antonio F. Pardiñas, Peter Holmans, Andrew J. Pocklington, Valentina Escott-Price, Stephan Ripke, Noa Carrera, Sophie E. Legge, Sophie Bishop, Darren Cameron, Marian L. Hamshere, Jun Han, Leon Hubbard, Amy Lynham, Kiran Mantripragada, Elliott Rees, James H. MacCabe, Steven A. McCarroll, Bernhard T. Baune, Gerome Breen, Enda M. Byrne, Udo Dannlowski, Thalia C. Eley, Caroline Hayward, Nicholas G. Martin, Andrew M. McIntosh, Robert Plomin, David J. Porteous, Naomi R. Wray, Armando Caballero, Daniel H. Geschwind, Laura M. Huckins, Douglas M. Ruderfer, Enrique Santiago, Pamela Sklar, Eli A. Stahl, Hyejung Won, Esben Agerbo, Thomas D. Als, Ole A. Andreassen, Marie Bækvad-Hansen, Preben Bo Mortensen, Carsten Bøcker Pedersen, Anders D. Børglum, Jonas Bybjerg-Grauholm, Srdjan Djurovic, Naser Durmishi, Marianne Giørtz Pedersen, Vera Golimbet, Jakob Grove, David M. Hougaard, Manuel Mattheisen, Espen Molden, Ole Mors, Merete Nordentoft, Milica Pejovic-Milovancevic, Engilbert Sigurdsson, Teimuraz Silagadze, Christine Søholm Hansen, Kari Stefansson, Hreinn Stefansson, Stacy Steinberg, Sarah Tosato, Thomas Werge, GERAD1 Consortium, CRESTAR Consortium, David A. Collier, Dan Rujescu, George Kirov, Michael J. Owen, Michael C. O’Donovan and James T. R. Walters (2018):
Schizophrenia is a debilitating psychiatric condition often associated with poor quality of life and decreased life expectancy. Lack of progress in improving treatment outcomes has been attributed to limited knowledge of the underlying biology, although large-scale genomic studies have begun to provide insights. We report a new genome-wide association study of schizophrenia (11,260 cases and 24,542 controls), and through meta-analysis with existing data we identify 50 novel associated loci and 145 loci in total. Through integrating genomic fine-mapping with brain expression and chromosome conformation data, we identify candidate causal genes within 33 loci. We also show for the first time that the common variant association signal is highly enriched among genes that are under strong selective pressures. These findings provide new insights into the biology and genetic architecture of schizophrenia, highlight the importance of mutation-intolerant genes and suggest a mechanism by which common risk variants persist in the population.
Background selection describes the loss of genetic diversity at a non-deleterious locus due to negative selection against linked deleterious alleles. It is one form of linked selection, where the maintenance or removal of an allele from a population is dependent upon the alleles in its linkage group. The name emphasizes the fact that the genetic background, or genomic environment, of a neutral mutation has a significant impact on whether it will be preserved or purged from a population. In some cases, the term background selection is used broadly to refer to all forms of linked selection, but most often it is used only when neutral variation is reduced due to negative selection against deleterious mutations. Background selection and all forms of linked selection contradict the assumption of the neutral theory of molecular evolution that the fixation or loss of neutral alleles is entirely stochastic, the result of genetic drift. Instead, these models predict that neutral variation is correlated with the selective pressures acting on linked non-neutral genes, that neutral traits are not necessarily oblivious to selection. Because they segregate together, non-neutral mutations linked to neutral polymorphisms result in decreased levels of genetic variation relative to predictions of neutral evolution.
Humans and animals are capable of learning a new behavior by observing others perform the skill just once. We consider the problem of allowing a robot to do the same – learning from a raw video pixels of a human, even when there is substantial domain shift in the perspective, environment, and embodiment between the robot and the observed human. Prior approaches to this problem have hand-specified how human and robot actions correspond and often relied on explicit human pose detection systems. In this work, we present an approach for one-shot learning from a video of a human by using human and robot demonstration data from a variety of previous tasks to build up prior knowledge through meta-learning. Then, combining this prior knowledge and only a single video demonstration from a human, the robot can perform the task that the human demonstrated. We show experiments on both a PR2 arm and a Sawyer arm, demonstrating that after meta-learning, the robot can learn to place, push, and pick-and-place new objects using just one video of a human performing the manipulation.
Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent’s competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a challenging 3D domain with an implicit sequence of tasks and sparse rewards. We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
A key challenge in model-based reinforcement learning (RL) is to synthesize computationally efficient and accurate environment models. We show that carefully designed generative models that learn and operate on compact state representations, so-called state-space models, substantially reduce the computational costs for predicting outcomes of sequences of actions. Extensive experiments establish that state-space models accurately capture the dynamics of Atari games from the Arcade Learning Environment from raw pixels. The computational speed-up of state-space models while maintaining high accuracy makes their application in RL feasible: We demonstrate that agents which query these models for decision making outperform strong model-free baselines on the game MSPACMAN, demonstrating the potential of using learned environment models for planning.
“Machine Theory of Mind”, Neil C. Rabinowitz, Frank Perbet, H. Francis Song, Chiyuan Zhang, S. M. Ali Eslami, Matthew Botvinick (2018-02-21):
Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans’ ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network – a ToMnet – which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone. Through this process, it acquires a strong prior model for agents’ behaviour, as well as the ability to bootstrap to richer predictions about agents’ characteristics and mental states using only a small number of behavioural observations. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep reinforcement learning agents from varied populations, and that it passes classic ToM tasks such as the "Sally-Anne" test (Wimmer & Perner, 1983; Baron-Cohen et al., 1985) of recognising that others can hold false beliefs about the world. We argue that this system – which autonomously learns how to model other agents in its world – is an important step forward for developing multi-agent AI systems, for building intermediating technology for machine-human interaction, and for advancing the progress on interpretable AI.
We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89 (Zoph et al., 2018), whose test error is 2.65
I apply recent work on "learning to think" (2015) and on PowerPlay (2011) to the incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills. The problem solver is a single recurrent neural network (or similar general purpose computer) called ONE. ONE is unusual in the sense that it is trained in various ways, e.g., by black box optimization / reinforcement learning / artificial evolution as well as supervised / unsupervised learning. For example, ONE may learn through neuroevolution to control a robot through environment-changing actions, and learn through unsupervised gradient descent to predict future inputs and vector-valued reward signals as suggested in 1990. User-given tasks can be defined through extra goal-defining input patterns, also proposed in 1990. Suppose ONE has already learned many skills. Now a copy of ONE can be re-trained to learn a new skill, e.g., through neuroevolution without a teacher. Here it may profit from re-using previously learned subroutines, but it may also forget previous skills. Then ONE is retrained in PowerPlay style (2011) on stored input/output traces of (a) ONE’s copy executing the new skill and (b) previous instances of ONE whose skills are still considered worth memorizing. Simultaneously, ONE is retrained on old traces (even those of unsuccessful trials) to become a better predictor, without additional expensive interaction with the enviroment. More and more control and prediction skills are thus collapsed into ONE, like in the chunker-automatizer system of the neural history compressor (1991). This forces ONE to relate partially analogous skills (with shared algorithmic information) to each other, creating common subroutines in form of shared subnetworks of ONE, to greatly speed up subsequent learning of additional, novel but algorithmically related skills.
This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by algorithmic information theory, we describe RNN-based AIs (RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending sequences of tasks, some of them provided by the user, others invented by the RNNAI itself in a curious, playful fashion, to improve its RNN-based world model. Unlike our previous model-building RNN-based RL machines dating back to 1990, the RNNAI learns to actively query its model for abstract reasoning and planning and decision making, essentially "learning to think." The basic ideas of this report can be applied to many other cases where one RNN-like system exploits the algorithmic information content of another. They are taken from a grant proposal submitted in Fall 2014, and also explain concepts such as "mirror neurons." Experimental results will be described in separate papers.
Background: Biomedical studies with low statistical power are a major concern in the scientific community and are one of the underlying reasons for the reproducibility crisis in science. If randomized clinical trials, which are considered the backbone of evidence-based medicine, also suffer from low power, this could affect medical practice.
Methods: We analysed the statistical power in 137 032 clinical trials between 1975 and 2017 extracted from meta-analyses from the Cochrane database of systematic reviews. We determined study power to detect standardized effect sizes according to Cohen, and in meta-analysis with p-value below 0.05 we based power on the meta-analysed effect size. Average power, effect size and temporal patterns were examined.
Results: The number of trials with power ≥80% was low but increased over time: from 9% in 1975–1979 to 15% in 2010–2014. This increase was mainly due to increasing sample sizes, whilst effect sizes remained stable with a median Cohen’s h of 0.21 (IQR 0.12-0.36) and a median Cohen’s d of 0.31 (0.19-0.51). The proportion of trials with power of at least 80% to detect a standardized effect size of 0.2 (small), 0.5 (moderate) and 0.8 (large) was 7%, 48% and 81%, respectively.
Conclusion: This study demonstrates that sufficient power in clinical trials is still problematic, although the situation is slowly improving. Our data encourages further efforts to increase statistical power in clinical trials to guarantee rigorous and reproducible evidence-based medicine.
Problems with social experiments and evaluating them, loopholes, causes, and suggestions; non-experimental methods systematically deliver false results, as most interventions fail or have small effects.
Machine learning-based analysis of human functional magnetic resonance imaging (fMRI) patterns has enabled the visualization of perceptual content. However, it has been limited to the reconstruction with low-level image bases (Miyawaki et al., 2008; Wen et al., 2016) or to the matching to exemplars (Naselaris et al., 2009; Nishimoto et al., 2011). Recent work showed that visual cortical activity can be decoded (translated) into hierarchical features of a deep neural network (DNN) for the same input image, providing a way to make use of the information from hierarchical visual features (Horikawa & Kamitani, 2017). Here, we present a novel image reconstruction method, in which the pixel values of an image are optimized to make its DNN features similar to those decoded from human brain activity at multiple layers. We found that the generated images resembled the stimulus images (both natural images and artificial shapes) and the subjective visual content during imagery. While our model was solely trained with natural images, our method successfully generalized the reconstruction to artificial shapes, indicating that our model indeed ‘reconstructs’ or ‘generates’ images from brain activity, not simply matches to exemplars. A natural image prior introduced by another deep neural network effectively rendered semantically meaningful details to reconstructions by constraining reconstructed images to be similar to natural images. Furthermore, human judgment of reconstructions suggests the effectiveness of combining multiple DNN layers to enhance visual quality of generated images. The results suggest that hierarchical visual information in the brain can be effectively combined to reconstruct perceptual and subjective images.
Deep neural networks (DNNs) have recently been applied successfully to brain decoding and image reconstruction from functional magnetic resonance imaging (fMRI) activity. However, direct training of a DNN with fMRI data is often avoided because the size of available data is thought to be insufficient to train a complex network with numerous parameters. Instead, a pre-trained DNN has served as a proxy for hierarchical visual representations, and fMRI data were used to decode individual DNN features of a stimulus image using a simple linear model, which were then passed to a reconstruction module. Here, we present our attempt to directly train a DNN model with fMRI data and the corresponding stimulus images to build an end-to-end reconstruction model. We trained a generative adversarial network with an additional loss term defined in a high-level feature space (feature loss) using up to 6,000 training data points (natural images and the fMRI responses). The trained deep generator network was tested on an independent dataset, directly producing a reconstructed image given an fMRI pattern as the input. The reconstructions obtained from the proposed method showed resemblance with both natural and artificial test stimuli. The accuracy increased as a function of the training data size, though not outperforming the decoded feature-based method with the available data size. Ablation analyses indicated that the feature loss played a critical role to achieve accurate reconstruction. Our results suggest a potential for the end-to-end framework to learn a direct mapping between brain activity and perception given even larger datasets.
Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.
Molecular gastronomy is a subdiscipline of food science that seeks to investigate the physical and chemical transformations of ingredients that occur in cooking. Its program includes three areas, as cooking was recognized to have three components: social, artistic, and technical. Molecular cuisine is a modern style of cooking, and takes advantage of many technical innovations from the scientific disciplines.
Alinea is a restaurant in Chicago, Illinois, United States. In 2010, Alinea was awarded three stars from the Michelin Guide. As of December 20, 2017, Alinea is the only Chicago restaurant to retain a three-star status, Michelin's highest accolade.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
One defense of free markets notes the inability of non-market mechanisms to solve planning & optimization problems. This has difficulty with Coase’s paradox of the firm, and I note that the difficulty is increased by the fact that with improvements in computers, algorithms, and data, ever larger planning problems are solved. Expanding on some Cosma Shalizi comments, I suggest interpreting phenomenon as multi-level nested optimization paradigm: many systems can be usefully described as having two (or more) levels where a slow sample-inefficient but ground-truth ‘outer’ loss such as death, bankruptcy, or reproductive fitness, trains & constrains a fast sample-efficient but possibly misguided ‘inner’ loss which is used by learned mechanisms such as neural networks or linear programming group selection perspective. So, one reason for free-market or evolutionary or Bayesian methods in general is that while poorer at planning/optimization in the short run, they have the advantage of simplicity and operating on ground-truth values, and serve as a constraint on the more sophisticated non-market mechanisms. I illustrate by discussing corporations, multicellular life, reinforcement learning & meta-learning in AI, and pain in humans. This view suggests that are inherent balances between market/non-market mechanisms which reflect the relative advantages between a slow unbiased method and faster but potentially arbitrarily biased methods.