“The Power and the Gory”, Solotaroff 1990 (Gonzo journalism on steroids & bodybuilding; multiple waves in bodybuilding track scientific/technological/economic progress in hormone/drug development)
Geometry Wars 3: Dimensions (purchased after a Linux upgrade killed my copy of GridWars 2; GW2 had entranced me after reading World of Stuart’s review and being intrigued by the described strategic gameplay of black hole farming. After some experimentation, I discovered that the control combination of a mouse for movement and the d-pad for firing offered me an extraordinary level of laser-sharp eyeblink-fast control which unlocked the higher reaches of black hole farming, allowing for intense trance-inducing games lasting up to my high score of 35m30s/2,945,963 points. Naturally I assumed as a ‘clone’ that GW2’s mouse-keyboard control scheme was drawn from the original Geometry Wars. Alas! GW3, despite many years of development, offers naught but the twin-sticks Xbox gamepad and a clumsy keyboard-only control scheme. Playing even the simplest level of GW3 feels like wading through mud—after being anesthetized. Going back would be as difficult as using a push lawn mower after a z-track riding lawn mower, a HDD after an SSD, a mouse after a trackball… GW3 boasts many levels and modes, but I fear they all must be enjoyed through the same control scheme. If I had not played GW2, perhaps I could appreciate GW3 for what it is, but I have been spoiled. I will simply have to figure out how to get GW2 running again, in a VM if I must.)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
“Multi-trait analysis of genome-wide association summary statistics using MTAG”, Patrick Turley, Raymond K. Walters, Omeed Maghzian, Aysu Okbay, James J. Lee, Mark Alan Fontana, Tuan Anh Nguyen-Viet, Robbee Wedow, Meghan Zacher, Nicholas A. Furlotte, 23andMe Research Team, Social Science Genetic Association Consortium, Patrik Magnusson, Sven Oskarsson, Magnus Johannesson, Peter M. Visscher, David Laibson, David Cesarini, Benjamin M. Neale, Daniel J. Benjamin (2017-10-23):
We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (Neff = 354,862), neuroticism (n = 168,105), and subjective well-being (n = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Intelligence, or general cognitive function, is phenotypically and genetically correlated with many traits, including many physical and mental health variables. Both education and household income are strongly genetically correlated with intelligence, at rg =0.73 and rg =0.70 respectively. This allowed us to utilize a novel approach, Multi-Trait Analysis of Genome-wide association studies (MTAG; Turley et al. 2017), to combine two large genome-wide association studies (GWASs) of education and household income to increase power in the largest GWAS on intelligence so far (Sniekers et al. 2017). This study had four goals: firstly, to facilitate the discovery of new genetic loci associated with intelligence; secondly, to add to our understanding of the biology of intelligence differences; thirdly, to examine whether combining genetically correlated traits in this way produces results consistent with the primary phenotype of intelligence; and, finally, to test how well this new meta-analytic data sample on intelligence predict phenotypic intelligence variance in an independent sample. We apply MTAG to three large GWAS: Sniekers et al (2017) on intelligence, Okbay et al. (2016) on Educational attainment, and Hill et al. (2016) on household income. By combining these three samples our functional sample size increased from 78 308 participants to 147 194. We found 107 independent loci associated with intelligence, implicating 233 genes, using both SNP-based and gene-based GWAS. We find evidence that neurogenesis may explain some of the biological differences in intelligence as well as genes expressed in the synapse and those involved in the regulation of the nervous system. We show that the results of our combined analysis demonstrate the same pattern of genetic correlations as a single measure/the simple measure of intelligence, providing support for the meta-analysis of these genetically-related phenotypes. We find that our MTAG meta-analysis of intelligence shows similar genetic correlations to 26 other phenotypes when compared with a GWAS consisting solely of cognitive tests. Finally, using an independent sample of 6 844 individuals we were able to predict 7% of intelligence using SNP data alone.
In multivariate quantitative genetics, a genetic correlation is the proportion of variance that two traits share due to genetic causes, the correlation between the genetic influences on a trait and the genetic influences on a different trait estimating the degree of pleiotropy or causal overlap. A genetic correlation of 0 implies that the genetic effects on one trait are independent of the other, while a correlation of 1 implies that all of the genetic influences on the two traits are identical. The bivariate genetic correlation can be generalized to inferring genetic latent variable factors across > 2 traits using factor analysis. Genetic correlation models were introduced into behavioral genetics in the 1970s–1980s.
We used a case-control genome-wide association (GWA) design with cases consisting of 1238 individuals from the top 0.0003 (~170 mean IQ) of the population distribution of intelligence and 8172 unselected population-based controls. The single-nucleotide polymorphism heritability for the extreme IQ trait was 0.33 (0.02), which is the highest so far for a cognitive phenotype, and significant genome-wide genetic correlations of 0.78 were observed with educational attainment and 0.86 with population IQ. Three variants in locus ADAM12 achieved genome-wide significance, although they did not replicate with published GWA analyses of normal-range IQ or educational attainment. A genome-wide polygenic score constructed from the GWA results accounted for 1.6% of the variance of intelligence in the normal range in an unselected sample of 3414 individuals, which is comparable to the variance explained by GWA studies of intelligence with substantially larger sample sizes. The gene family plexins, members of which are mutated in several monogenic neurodevelopmental disorders, was significantly enriched for associations with high IQ. This study shows the utility of extreme trait selection for genetic study of intelligence and suggests that extremely high intelligence is continuous genetically with normal-range intelligence in the population.
“Genome-wide genetic data on ~500,000 UK Biobank participants”, Clare Bycroft, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T. Elliott, Kevin Sharp, Allan Motyer, Damjan Vukcevic, Olivier Delaneau, Jared O’Connell, Adrian Cortes, Samantha Welsh, Gil McVean, Stephen Leslie, Peter Donnelly, Jonathan Marchini (2017-07-20):
The UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.
Neuroticism is a stable personality trait 1; twin studies report heritability between 30% and 50% 2, and SNP-based heritability is about 15% 3. Higher levels of neuroticism are associated with poorer mental and physical health 4,5, and the economic burden of neuroticism for societies is high 6. To date, genome-wide association (GWA) studies of neuroticism have identified up to 11 genetic loci 3,7. Here we report 116 significant independent genetic loci from a GWA of neuroticism in 329,821 UK Biobank participants, with replication available in a GWA meta-analysis of neuroticism in 122,867 individuals. Genetic signals for neuroticism were enriched in neuronal genesis and differentiation pathways, and substantial genetic correlations were found between neuroticism and depressive symptoms (rg = 0.82, SE=.03), major depressive disorder (rg = 0.69, SE=.07) and subjective wellbeing (rg = -.68, SE=.03) alongside other mental health traits. These discoveries significantly advance our understanding of neuroticism and its association with major depressive disorder.
“Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum”, Andrea Ganna, F. Kyle Satterstrom, Seyedeh M. Zekavat, Indraniel Das, Mitja I. Kurki, Claire Churchhouse, Jessica Alfoldi, Alicia R. Martin, Aki S. Havulinna, Andrea Byrnes, Wesley K. Thompson, Philip R. Nielsen, Konrad J. Karczewski, Elmo Saarentaus, Manuel A. Rivas, Namrata Gupta, Olli Pietiläinen, Connor A. Emdin, Francesco Lescai, Jonas Bybjerg-Grauholm, Jason Flannick, on behalf of GoT2D/T2D-GENES consortium, Josep Mercader, Miriam Udlerg, on behalf of SIGMA consortium, Helmsley IBD Exome Sequencing Project, FinMetSeq Consortium, iPSYCH-Broad Consortium, Markku Laakso, Veikko Salomaa, Christina Hultman, Samuli Ripatti, Eija Hämäläinen, Jukka S. Moilanen, Jarmo Körkkö, Outi Kuismin, Merete Nordentoft, David M. Hougaard, Ole Mors, Thomas Werge, Preben Bo Mortensen, Daniel MacArthur, Mark J. Daly, Patrick F. Sullivan, Adam E. Locke, Aarno Palotie, Anders D. Børglum, Sekar Kathiresan, Benjamin M. Neale (2017-06-09):
Protein truncating variants (PTVs) are likely to modify gene function and have been linked to hundreds of Mendelian disorders1,2. However, the impact of PTVs on complex traits has been limited by the available sample size of whole-exome sequencing studies (WES) 3. Here we assemble WES data from 100,304 individuals to quantify the impact of rare PTVs on 13 quantitative traits and 10 diseases. We focus on those PTVs that occur in PTV-intolerant (PI) genes, as these are more likely to be pathogenic. Carriers of at least one PI-PTV were found to have an increased risk of autism, schizophrenia, bipolar disorder, intellectual disability and ADHD (p-value (p) range: 5×10−3−9×10−12). In controls, without these disorders, we found that this burden associated with increased risk of mental, behavioral and neurodevelopmental disorders as captured by electronic health record information. Furthermore, carriers of PI-PTVs tended to be shorter (p = 2×10−5), have fewer years of education (p = 2×10−4) and be younger (p = 2×10−7); the latter observation possibly reflecting reduced survival or study participation. While other gene-sets derived from in vivo experiments did not show any associations with PTV-burden, gene sets implicated in GWAS of cardiovascular-related traits and inflammatory bowel disease showed a significant PTV-burden with corresponding traits, mainly driven by established genes involved in familial forms of these disorders. We leveraged population health registries from 14,117 individuals to study the phenome-wide impact of PIPTVs and identified an increase in the number of hospital visits among PI-PTV carriers. In conclusion, we provide the most thorough investigation to date of the impact of rare deleterious coding variants on complex traits, suggesting widespread pleiotropic risk.
Helmsley IBD Exome Sequencing Project: Dermot McGovern, Judy H Cho, Ann Pulver, Vincent Plagnol, Tony Segal, Gil Atzmon, Dan Turner, Ben Glaser, Inga Peter, Ramnik Xavier, Harry Sokol, Rinse Weersma, Andre Franke, John Rioux, Tariq Ahmad, Martti Färkkilä, Kimmo Kontula.
FinMetSeq Consortium: Haley J Abel, Michael Boehnke, Lei Chen, Charleston WK Chiang, Colby C Chiang, Susan K Dutcher, Nelson B Freimer, Robert S Fulton, Liron Ganel, Ira M Hall, Anne U Jackson, Krishna L Kanchi, Chul Joo Kang, Daniel C Koboldt, Hannele Laivuori, David E Larson, Karyn Meltz Steinberg, Joanne Nelson, Thomas J Nicholas, Arto Pietilä, Matti Pirinen, Vasily Ramensky, Debashree Ray, Chiara Sabatti, Laura J Scott, Susan Service, Laurel Stell, Nathan O Stitziel, Heather M Stringham, Ryan Welch, Richard K Wilson, Pranav Yajnik.
iPSYCH-Broad Consortium: Marianne G Pedersen, Marie Bækvad-Hansen, Christine S Hansen.
The ability of a population to adapt to changes in their living conditions, whether in nature or captivity, often depends on polymorphisms in multiple genes across the genome. In-depth studies of such polygenic adaptations are difficult in natural populations, but can be approached using the resources provided by artificial selection experiments. Here, we dissect the genetic mechanisms involved in long-term selection responses of the Virginia chicken lines, populations that after 40 generations of divergent selection for 56-day body weight display a nine-fold difference in the selected trait. In the F15 generation of an intercross between the divergent lines, 20 loci explained more than 60% of the additive genetic variance for the selected trait. We focused particularly on seven major QTL and found that only two fine-mapped to single, bi-allelic loci; the other five contained linked loci, multiple alleles or were epistatic. This detailed dissection of the polygenic adaptations in the Virginia lines provides a deeper understanding of genome-wide mechanisms involved in the long-term selection responses. The results illustrate that long-term selection responses, even from populations with a limited genetic diversity, can be polygenic and influenced by a range of genetic mechanisms.
Our understanding of the genetic basis of human adaptation is biased toward loci of large phenotypic effect. Genome wide association studies (GWAS) now enable the study of genetic adaptation in highly polygenic phenotypes. Here we test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from GWAS of 34 complex traits. Comparing these polygenic scores to a null distribution under genetic drift, we identify strong signals of selection for a suite of anthropometric traits including height, infant head circumference (IHC), hip circumference and waist-to-hip ratio (WHR), as well as type 2 diabetes (T2D). In addition to the known north-south gradient of polygenic height scores within Europe, we find that natural selection has contributed to a gradient of decreasing polygenic height scores from West to East across Eurasia. Analyzing a set of ancient DNA samples from across Eurasia, we show that much of this gradient can be explained by selection for increased height in two long diverged hunter-gatherer populations living in western and west-central Eurasia sometime during or shortly after the last glacial maximum. We find that the signal of selection on hip circumference can largely be explained as a correlated response to selection on height. However, our signals in IHC and WHR cannot, suggesting that these patterns are the result of selection along multiple axes of body shape variation. Our observation that IHC and WHR polygenic scores follow a strong latitudinal cline in Western Eurasia support the role of natural selection in establishing Bergmann’s Rule in humans, and are consistent with thermoregulatory adaptation in response to latitudinal temperature variation.
One Sentence Summary
Natural selection has lead to divergence in multiple quantitative traits in humans across Eurasian populations.
Subreddit devoted to discussion of reinforcement learning research and projects, particularly deep reinforcement learning (more specialized than /r/MachineLearning). Major themes include deep learning, model-based vs model-free RL, robotics, multi-agent RL, exploration, meta-reinforcement learning, imitation learning, the psychology of RL in biological organisms such as humans, and safety/AI risk. Moderate activity level (as of 2019-09-11): ~10k subscribers, 2k pageviews/daily
“Learning model-based planning from scratch”, Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, Peter Battaglia (2017-07-19):
Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a "plan context" which conditions future real and imagined actions. The agent can even decide how to imagine: testing out alternative imagined actions, chaining sequences of actions together, or building a more complex "imagination tree" by navigating flexibly among the previously imagined states using a learned policy. And our agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated with using its imagination. We show that our architecture can learn to solve a challenging continuous control problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new direction toward learning the components of a model-based planning system and how to use them.
“Imagination-Augmented Agents for Deep Reinforcement Learning”, Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra (2017-07-19):
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.
In this paper, we introduce Path Integral Networks (PI-Net), a recurrent network representation of the Path Integral optimal control algorithm. The network includes both system dynamics and cost models, used for optimal control based planning. PI-Net is fully differentiable, learning both dynamics and cost models end-to-end by back-propagation and stochastic gradient descent. Because of this, PI-Net can learn to plan. PI-Net has several advantages: it can generalize to unseen states thanks to planning, it can be applied to continuous control tasks, and it allows for a wide variety learning schemes, including imitation and reinforcement learning. Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem. We also show that PI-Net is able to learn dynamics and cost models latent in the demonstrations.
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.
We introduce a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions. Unlike dynamics models that operate over individual discrete timesteps, we learn the distribution over future state trajectories conditioned on past state, past action, and planned future action trajectories, as well as a latent prior over action trajectories. Our approach is based on convolutional autoregressive models and variational autoencoders. It makes stable and accurate predictions over long horizons for complex, stochastic systems, effectively expressing uncertainty and modeling the effects of collisions, sensory noise, and action delays. The learned dynamics model and action prior can be used for end-to-end, fully differentiable trajectory optimization and model-based policy optimization, which we use to evaluate the performance and sample-efficiency of our method.
Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbmf
Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle D that discriminates between the expert’s data distribution and that of the generative model G. The generative model is trained to capture the expert’s distribution by maximizing the probability of D misclassifying the data it generates. Overall, the system is differentiable end-to-end and is trained using basic backpropagation. This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be differentiable, which requires the use of high-variance gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully differentiable, which enables us to train policies using the (stochastic) gradient of D. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune. We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.
We present a new approach to learning for planning, where knowledge acquired while solving a given set of planning problems is used to plan faster in related, but new problem instances. We show that a deep neural network can be used to learn and represent a generalized reactive policy (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances. In contrast to prior efforts in this direction, our approach significantly reduces the dependence of learning on handcrafted domain knowledge or feature selection. Instead, the GRP is trained from scratch using a set of successful execution traces. We show that our approach can also be used to automatically learn a heuristic function that can be used in directed search algorithms. We evaluate our approach using an extensive suite of experiments on two challenging planning problem domains and show that our approach facilitates learning complex decision making policies and powerful heuristic functions with minimal human input. Videos of our results are available at goo.gl/Hpy4e3.
A key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-based reinforcement learning holds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation – pushing objects – and can handle novel objects not seen during training.
AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven’t yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human’s intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent’s learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.
Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. The key contribution of this work is the design of a new search space (the "NASNet search space") which enables transferability. In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture". We also introduce a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. On CIFAR-10 itself, NASNet achieves 2.4 achieves, among the published works, state-of-the-art accuracy of 82.7 and 96.2 the best human-invented architectures while having 9 billion fewer FLOPS—a reduction of 28 model. When evaluated at different levels of computational cost, accuracies of NASNets exceed those of the state-of-the-art human-designed models. For instance, a small version of NASNet also achieves 74 3.1 platforms. Finally, the learned features by NASNet used with the Faster-RCNN framework surpass state-of-the-art by 4.0 dataset.
Techniques for automatically designing deep neural network architectures such as reinforcement learning based approaches have recently shown promising results. However, their success is based on vast computational resources (e.g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4.23% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.
We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net. During real-world workflows, one often trains very many different neural networks during the experimentation and design process. This is a wasteful process in which each new model is trained from scratch. Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network. Our techniques are based on the concept of function-preserving transformations between neural network specifications. This differs from previous approaches to pre-training that altered the function represented by a neural net when adding layers to it. Using our knowledge transfer mechanism to add depth to Inception modules, we demonstrate a new state of the art accuracy rating on the ImageNet dataset.
“Noisy Networks for Exploration”, Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg (2017-06-30):
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and ϵ-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.
This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.
Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable—attributes that are critical in deep reinforcement learning.
Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert’s cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.
Rapid progress in deep reinforcement learning has made it increasingly feasible to train controllers for high-dimensional humanoid bodies. However, methods that use pure reinforcement learning with simple reward functions tend to produce non-humanlike and overly stereotyped movement behaviors. In this work, we extend generative adversarial imitation learning to enable training of generic neural network policies to produce humanlike movement patterns from limited demonstrations consisting only of partially observed state features, without access to actions, even when the demonstrations come from a body with different and unknown physical parameters. We leverage this approach to build sub-skill policies from motion capture data and show that they can be reused to solve tasks when controlled by a higher level controller.
“Deep Q-learning from Demonstrations”, Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys (2017-04-12):
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator’s actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfD’s performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.
Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100 video frames obtained on a moving vehicle(field test) for the target classifier.
Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, many recent meta-learning approaches are extensively hand-designed, either using architectures specialized to a particular application, or hard-coding algorithmic components that constrain how the meta-learner solves the task. We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. In the most extensive set of meta-learning experiments to date, we evaluate the resulting Simple Neural AttentIve Learner (or SNAIL) on several heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins.
I give a brief overview of arXiv history, and describe the current state of arXiv practice, both technical and sociological. This commentary originally appeared in the EMBO Journal, 19 Oct 2016. It was intended as an update on comments from the late 1990s regarding use of preprints by biologists (or lack thereof), but may be of interest to practitioners of other disciplines. It is based largely on a keynote presentation I gave to the ASAPbio inaugural meeting in Feb 2016, and responds as well to some follow-up questions.
arXiv is an open-access repository of electronic preprints approved for posting after moderation, but not peer review. It consists of scientific papers in the fields of mathematics, physics, astronomy, electrical engineering, computer science, quantitative biology, statistics, mathematical finance and economics, which can be accessed online. In many fields of mathematics and physics, almost all scientific papers are self-archived on the arXiv repository before publication in a peer-reviewed journal. Begun on August 14, 1991, arXiv.org passed the half-million-article milestone on October 3, 2008, and had hit a million by the end of 2014. By October 2016, the submission rate had grown to more than 10,000 per month.
Background: Despite substantial financial contributions by the United States President’s Malaria Initiative (PMI) since 2006, no studies have carefully assessed how this program may have affected important population-level health outcomes. We utilized multiple publicly available data sources to evaluate the association between introduction of PMI and child mortality rates in sub-Saharan Africa (SSA).
Methods and findings: We used difference-in-differences analyses to compare trends in the primary outcome of under-5 mortality rates and secondary outcomes reflecting population coverage of malaria interventions in 19 PMI-recipient and 13 non-recipient countries between 1995 and 2014. The analyses controlled for presence and intensity of other large funding sources, individual and household characteristics, and country and year fixed effects.
PMI program implementation was associated with a significant reduction in the annual risk of under-5 child mortality (adjusted risk ratio [RR] 0.84, 95% CI 0.74–0.96). Each dollar of per-capita PMI expenditures in a country, a measure of PMI intensity, was also associated with a reduction in child mortality (RR 0.86, 95% CI 0.78–0.93). We estimated that the under-5 mortality rate in PMI countries was reduced from 28.9 to 24.3 per 1,000 person-years. Population coverage of insecticide-treated nets increased by 8.34 percentage points (95% CI 0.86–15.83) and coverage of indoor residual spraying increased by 6.63 percentage points (95% CI 0.79–12.47) after PMI implementation. Per-capita PMI spending was also associated with a modest increase in artemisinin-based combination therapy coverage (3.56 percentage point increase, 95% CI −0.07–7.19), though this association was only marginally significant (p = 0.054). Our results were robust to several sensitivity analyses. Because our study design leaves open the possibility of unmeasured confounding, we cannot definitively interpret these results as causal.
Conclusions: PMI may have significantly contributed to reducing the burden of malaria in SSA and reducing the number of child deaths in the region. Introduction of PMI was associated with increased coverage of malaria prevention technologies, which are important mechanisms through which child mortality can be reduced. To our knowledge, this study is the first to assess the association between PMI and all-cause child mortality in SSA with the use of appropriate comparison groups and adjustments for regional trends in child mortality.
Why was this study done?
Despite the considerable investment the US government has made in the President’s Malaria Initiative (PMI) since 2006, no studies to date have evaluated its association with population health outcomes.
Previous evaluations have documented decreasing child mortality and increasing use of key malaria interventions in PMI-recipient countries. Our study sought to determine whether the trends in health outcomes in PMI-recipient countries differed significantly from the trends in these outcomes in PMI non-recipient countries in sub-Saharan Africa (SSA) over the past 2 decades.
What did the researchers do and find?
We used a study design that leveraged multiple publicly available data sources from countries throughout SSA, spanning the years before and after PMI introduction, in order to estimate association between the introduction of PMI and child mortality rates.
Our dataset included 7,752,071 child-year observations from 2,112,951 individual children who lived in 32 sub-Saharan countries, including all 19 PMI countries.
We found that after adjusting for baseline differences between countries, overall time trends, other funding sources, and individual characteristics, PMI was associated with 16% annual risk reduction in child mortality and increased population coverage of key malaria prevention and treatment technologies.
We tested the robustness of our results with a series of sensitivity analyses.
What do these findings mean?
The study provides evidence that introduction of PMI was associated with significant reductions in child mortality in SSA, primarily through increased access to malaria prevention technologies.
Evidence from this study can be used to inform policy decisions about future funding levels for malaria interventions.
The interpretation of our study results rests on the assumption that there were no important unmeasured variables that differentially affected mortality rates in PMI and comparison countries during the study period.
Suicide is a major public health concern. High-dose lithium is used to stabilize mood and prevent suicide in patients with affective disorders. Lithium occurs naturally in drinking water worldwide in much lower doses, but with large geographical variation. Several studies conducted at an aggregate level have suggested an association between lithium in drinking water and a reduced risk of suicide; however, a causal relation is uncertain. Individual-level register-based data on the entire Danish adult population (3.7 million individuals) from 1991 to 2012 were linked with a moving five-year time-weighted average (TWA) lithium exposure level from drinking water hypothesizing an inverse relationship. The mean lithium level was 11.6 μg/L ranging from 0.6 to 30.7 μg/L. The suicide rate decreased from 29.7 per 100,000 person-years at risk in 1991 to 18.4 per 100,000 person-years in 2012. We found no significant indication of an association between increasing five-year TWA lithium exposure level and decreasing suicide rate. The comprehensiveness of using individual-level data and spatial analyses with 22 years of follow-up makes a pronounced contribution to previous findings. Our findings demonstrate that there does not seem to be a protective effect of exposure to lithium on the incidence of suicide with levels below 31 μg/L in drinking water. [Keywords: drinking water; lithium; suicide; individual-level data; spatial analysis; Denmark; exposure assessment]
The power of a binary hypothesis test is the probability that the test fails to reject the null hypothesis when a specific alternative hypothesis is true — i.e., it indicates the probability of avoiding a type II error. The statistical power ranges from 0 to 1, and as statistical power increases, the probability of making a type II error decreases.
The Gompertz–Makeham law states that the human death rate is the sum of an age-dependent component, which increases exponentially with age and an age-independent component. In a protected environment where external causes of death are rare, the age-independent mortality component is often negligible. In this case the formula simplifies to a Gompertz law of mortality. In 1825, Benjamin Gompertz proposed an exponential increase in death rates with age.
Single-letter second-level domains are domain names in which the second-level domain consists of only one letter, such as x.com. In 1993, the Internet Assigned Numbers Authority (IANA) explicitly reserved all single-letter and single-digit second-level domain names in the top-level domains com, net, and org, and grandfathered those that had already been assigned. In December 2005, ICANN considered auctioning these domains.
Jonathan Bruce Postel was an American computer scientist who made many significant contributions to the development of the Internet, particularly with respect to standards. He is known principally for being the Editor of the Request for Comment (RFC) document series, for Simple Mail Transfer Protocol (SMTP), and for administering the Internet Assigned Numbers Authority (IANA) until his death. In his lifetime he was known as the "god of the Internet" for his comprehensive influence on the medium, although Postel himself noted that this "complement" came with a barb—with the article that introduced it also suggesting that he should be replaced by a "Professional."
Jorge Francisco Isidoro Luis Borges Acevedo was an Argentine short-story writer, essayist, poet and translator, and a key figure in Spanish-language and universal literature. His best-known books, Ficciones (Fictions) and El Aleph, published in the 1940s, are compilations of short stories interconnected by common themes, including dreams, labyrinths, philosophers, libraries, mirrors, fictional writers, and mythology. Borges' works have contributed to philosophical literature and the fantasy genre, and have been considered by some critics to mark the beginning of the magic realist movement in 20th century Latin American literature. His late poems converse with such cultural figures as Spinoza, Camões, and Virgil.
[6pg Borges essay on the literary merits of different translations of Homer and the problems of translation: the Newman-Arnold debate encapsulates the basic problem of literality vs literary. Borges gives translations of one passage by Buckley, Butcher & Lang, Cowper, Pope, Chapman, and Butler. Which is best? See also Borges 1936, “The Translators of the Thousand and One Nights”, a much more extended discussion of different translations of a work.]
At Trieste, in 1872, in a palace with damp statues and deficient hygienic facilities, a gentleman on whose face an African scar told its tale-Captain Richard Francis Burton, the English consul-embarked on a famous translation of the Quitab alif laila ua laila, which the roumis know by the title The Thousand and One Nights. One of the secret aims of his work was the annihilation of another gentleman (also weather-beaten, and with a dark and Moorish beard) who was compiling a vast dictionary in England and who died long before he was annihilated by Burton. That gentleman was Edward Lane, the Orientalist, author of a highly scrupulous version of The Thousand and One Nights that had supplanted a version by Galland. Lane translated against Galland, Burton against Lane; to understand Burton we must understand this hostile dynasty.
Borges considers the problem of whether Argentinian writing on non-Argentinian subjects can still be truly "Argentine." His conclusion: ...We should not be alarmed and that we should feel that our patrimony is the universe; we should essay all themes, and we cannot limit ourselves to purely Argentine subjects in order to be Argentine; for either being Argentine is an inescapable act of fate—and in that case we shall be so in all events—or being Argentine is a mere affectation, a mask. I believe that if we surrender ourselves to that voluntary dream which is artistic creation, we shall be Argentine and we shall also be good or tolerable writers.
The Secret History of the Mongols is the oldest surviving literary work in the Mongolian language. It was written for the Mongol royal family some time after the 1227 death of Genghis Khan. The author is anonymous and probably originally wrote in the Mongolian script, but the surviving texts all derive from transcriptions or translations into Chinese characters that date from the end of the 14th century and were compiled by the Ming dynasty under the title The Secret History of the Yuan Dynasty. Also known as Tobchiyan in the History of Yuan.
Dunkirk is a 2017 war film written, directed, and produced by Christopher Nolan that depicts the Dunkirk evacuation of World War II. Its ensemble cast includes Fionn Whitehead, Tom Glynn-Carney, Jack Lowden, Harry Styles, Aneurin Barnard, James D'Arcy, Barry Keoghan, Kenneth Branagh, Cillian Murphy, Mark Rylance, and Tom Hardy. The film was distributed by Warner Bros.
Geometry Wars 3: Dimensions is a 2014 multidirectional shooter video game developed by Lucid Games and published by Activision under the Sierra Entertainment brand name. The game was released on November 25, 2014 for Microsoft Windows, OS X, GNU/Linux, PlayStation 3 and PlayStation 4, on November 26, 2014 for Xbox 360 and Xbox One and in the middle of 2015 for iOS and Android. As the sequel to Geometry Wars: Retro Evolved 2, Geometry Wars 3: Dimensions is the first Sierra video game not to be owned by their former owner Vivendi and the first game in the series to be released on Sony platforms. It is the sixth installment in the Geometry Wars series and the first one developed after the creator of the series Bizarre Creations was shut down by Activision.
GridWars or GridWars 2 is a video game inspired by Bizarre Creations' Geometry Wars by Canadian developer Marco Incitti. The first version of the game was released as freeware on December 21, 2005 on Marco Incitti's Blitz Creations website.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
The President's Malaria Initiative (PMI) is a U.S. Government initiative to control and eliminate malaria, one of the leading global causes of premature death and disability. The initiative was originally launched by U.S. president George W. Bush in 2005, and has been continued by each successive U.S. president.
The Book of the Thousand Nights and a Night (1885), subtitled A Plain and Literal Translation of the Arabian Nights Entertainments, is an English language translation of One Thousand and One Nights – a collection of Middle Eastern and South Asian stories and folk tales compiled in Arabic during the Islamic Golden Age – by the British explorer and Arabist Richard Francis Burton (1821–1890). It stood as the only complete translation of the Macnaghten or Calcutta II edition of the "Arabian Nights" until the Malcolm C. and Ursula Lyons translation in 2008.
Antoine Galland was a French orientalist and archaeologist, most famous as the first European translator of One Thousand and One Nights, which he called Les mille et une nuits. His version of the tales appeared in twelve volumes between 1704 and 1717 and exerted a significant influence on subsequent European literature and attitudes to the Islamic world. Jorge Luis Borges has suggested that Romanticism began when his translation was first read.
Edward William Lane was a British orientalist, translator and lexicographer. He is known for his Manners and Customs of the Modern Egyptians and the Arabic-English Lexicon, as well as his translations of One Thousand and One Nights and Selections from the Kur-án.
Sir Richard Francis Burton was a British explorer, geographer, translator, writer, soldier, orientalist, cartographer, ethnographer, ethnologist, spy, linguist, poet, fencer, Freemason, and diplomat. He was famed for his travels and explorations in Asia, Africa, and the Americas, as well as his extraordinary knowledge of languages and cultures. According to one count, he spoke 29 European, Asian, and African languages.
Joseph Charles Mardrus, otherwise known as "Jean-Charles Mardrus" (1868–1949), was a French physician, poet, and a noted translator. Today he is best known for his translation of the Thousand and One Nights from Arabic into French, which was published from 1898 to 1904, and was in turn rendered into English by Edward Powys Mathers. A newer edition, Le livre des mille nuits et une nuit, was published in 1926–1932.