2018-turley.pdf: “Multi-trait analysis of genome-wide association summary statistics using MTAG”, (2017-10-23; ):
We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies ( ) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (Neff = 354,862), neuroticism (n = 168,105), and subjective well-being (n = 388,538). As compared to the 32, 9, and 13 genome-wide statistically-significant loci identified in the single-trait (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Intelligence, or general cognitive function, is phenotypically and genetically correlated with many traits, including many physical and mental health variables. Both education and household income are strongly genetically correlated with intelligence, at r g = 0.73 and r g = 0.70 respectively. This allowed us to utilize a novel approach, Multi-Trait Analysis of Genome-wide association studies (MTAG; Turley et al 2017), to combine two large genome-wide association studies (GWASs) of education and household income to increase power in the largest on intelligence so far (Sniekers et al 2017). This study had four goals: firstly, to facilitate the discovery of new genetic loci associated with intelligence; secondly, to add to our understanding of the biology of intelligence differences; thirdly, to examine whether combining genetically correlated traits in this way produces results consistent with the primary phenotype of intelligence; and, finally, to test how well this new meta-analytic data sample on intelligence predict phenotypic intelligence variance in an independent sample. We apply MTAG to three large : Sniekers et al 2017 on intelligence, Okbay et al 2016 on Educational attainment, and Hill et al 2016 on household income. By combining these three samples our functional sample size increased from 78 308 participants to 147 194. We found 107 independent loci associated with intelligence, implicating 233 genes, using both SNP-based and gene-based . We find evidence that neurogenesis may explain some of the biological differences in intelligence as well as genes expressed in the synapse and those involved in the regulation of the nervous system. We show that the results of our combined analysis demonstrate the same pattern of as a single measure/the simple measure of intelligence, providing support for the meta-analysis of these genetically-related phenotypes. We find that our MTAG meta-analysis of intelligence shows similar genetic correlations to 26 other phenotypes when compared with a consisting solely of cognitive tests. Finally, using an independent sample of 6 844 individuals we were able to predict 7% of intelligence using SNP data alone.
We used a case-control genome-wide association ( ) design with cases consisting of 1238 individuals from the top 0.0003 (~170 mean IQ) of the population distribution of intelligence and 8172 unselected population-based controls. The single-nucleotide polymorphism heritability for the extreme IQ trait was 0.33 (0.02), which is the highest so far for a cognitive phenotype, and statistically-significant genome-wide of 0.78 were observed with educational attainment and 0.86 with population IQ. Three variants in locus ADAM12 achieved genome-wide statistical-significance, although they did not replicate with published analyses of normal-range IQ or educational attainment. A genome-wide polygenic score constructed from the GWA results accounted for 1.6% of the of intelligence in the normal range in an unselected sample of 3414 individuals, which is comparable to the explained by studies of intelligence with substantially larger sample sizes. The gene family plexins, members of which are mutated in several monogenic neurodevelopmental disorders, was statistically-significantly enriched for associations with high IQ. This study shows the utility of extreme trait selection for genetic study of intelligence and suggests that extremely high intelligence is continuous genetically with normal-range intelligence in the population.
“Genome-wide genetic data on ~500,000 UK Biobank participants”, (2017-07-20):
The UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40–69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data—such as population structure and relatedness—that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100× to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies ( ) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.
Neuroticism is a stable personality trait 1; twin studies report heritability between 30% and 50% 2, and-based heritability is about 15% 3. Higher levels of neuroticism are associated with poorer mental and physical health 4,5, and the economic burden of neuroticism for societies is high 6. To date, genome-wide association ( ) studies of neuroticism have identified up to 11 genetic loci 3,7. Here we report 116 significant independent genetic loci from a of neuroticism in 329,821 UK Biobank participants, with replication available in a meta-analysis of neuroticism in 122,867 individuals. Genetic signals for neuroticism were enriched in neuronal genesis and differentiation pathways, and substantial were found between neuroticism and depressive symptoms (rg = 0.82, SE=.03), major depressive disorder (rg = 0.69, SE=.07) and subjective wellbeing (rg = -.68, SE=.03) alongside other mental health traits. These discoveries significantly advance our understanding of neuroticism and its association with major depressive disorder.
The ability of a population to adapt to changes in their living conditions, whether in nature or captivity, often depends on polymorphisms in multiple genes across the genome. In-depth studies of such polygenic adaptations are difficult in natural populations, but can be approached using the resources provided by artificial selection experiments. Here, we dissect the genetic mechanisms involved in long-term selection responses of the Virginia chicken lines, populations that after 40 generations of divergent selection for 56-day body weight display a nine-fold difference in the selected trait. In the F15 generation of an intercross between the divergent lines, 20 loci explained more than 60% of the additive genetic QTL and found that only two fine-mapped to single, bi-allelic loci; the other five contained linked loci, multiple alleles or were epistatic. This detailed dissection of the polygenic adaptations in the Virginia lines provides a deeper understanding of genome-wide mechanisms involved in the long-term selection responses. The results illustrate that long-term selection responses, even from populations with a limited genetic diversity, can be polygenic and influenced by a range of genetic mechanisms.for the selected trait. We focused particularly on seven major
Our understanding of the genetic basis of human adaptation is biased toward loci of large phenotypic effect. Genome wide association studies (circumference (IHC), hip circumference and waist-to-hip ratio (WHR), as well as type 2 diabetes (T2D). In addition to the known north-south gradient of polygenic height scores within Europe, we find that natural selection has contributed to a gradient of decreasing polygenic height scores from West to East across Eurasia. Analyzing a set of ancient DNA samples from across Eurasia, we show that much of this gradient can be explained by selection for increased height in two long diverged hunter-gatherer populations living in western and west-central Eurasia sometime during or shortly after the last glacial maximum. We find that the signal of selection on hip circumference can largely be explained as a correlated response to selection on height. However, our signals in IHC and WHR cannot, suggesting that these patterns are the result of selection along multiple axes of body shape variation. Our observation that IHC and WHR polygenic scores follow a strong latitudinal cline in Western Eurasia support the role of in establishing Bergmann’s Rule in humans, and are consistent with thermoregulatory adaptation in response to latitudinal temperature variation.) now enable the study of genetic adaptation in highly polygenic phenotypes. Here we test for polygenic adaptation among 187 world-wide human populations using polygenic scores constructed from of 34 complex traits. Comparing these polygenic scores to a null distribution under genetic drift, we identify strong signals of selection for a suite of anthropometric traits including height, infant head
One Sentence Summary
Natural selection has lead to divergence in multiple quantitative traits in humans across Eurasian populations.
“A Population Genetic Signal of Polygenic Adaptation”, (2014-04-17):
Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We use a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model, we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of QST ⁄ FST comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also substantially outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.). Here we combine this knowledge from with robust
The process of adaptation is of fundamental importance in evolutionary biology. Within the last few decades, genotyping technologies and new statistical methods have given evolutionary biologists the ability to identify individual regions of the genome that are likely to have been important in this process. When adaptation occurs in traits that are underwritten by many genes, however, the genetic signals left behind are more diffuse, and no individual region of the genome is likely to show strong signatures of selection. Identifying this signature therefore requires a detailed annotation of sites associated with a particular phenotype. Here we develop and implement a suite of statistical methods to integrate this sort of annotation from genome wide association studies with allele frequency data from many populations, providing a powerful way to identify the signal of adaptation in polygenic traits. We apply our methods to test for the impact of selection on human height, skin pigmentation, body mass index, type 2 diabetes risk, and inflammatory bowel disease risk. We find relatively strong signals for height and skin pigmentation, moderate signals for inflammatory bowel disease, and comparatively little evidence forand type 2 diabetes risk.
Subreddit devoted to discussion of reinforcement learning research and projects, particularly deep (more specialized than
/r/MachineLearning). Major themes include deep learning, model-based vs model-free RL, robotics, multi-agent RL, exploration, meta- , imitation learning, the psychology of RL in biological organisms such as humans, and safety/AI risk. Moderate activity level (as of 2019-09-11): ~10k subscribers, 2k pageviews/daily
“Learning model-based planning from scratch”, (2017-07-19):
Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the “Imagination-based Planner”, the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a “plan context” which conditions future real and imagined actions. The agent can even decide how to imagine: testing out alternative imagined actions, chaining sequences of actions together, or building a more complex “imagination tree” by navigating flexibly among the previously imagined states using a learned policy. And our agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated with using its imagination. We show that our architecture can learn to solve a challenging continuous control problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new direction toward learning the components of a model-based planning system and how to use them.
“Imagination-Augmented Agents for Deep Reinforcement Learning”, (2017-07-19):
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deepcombining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.
In this paper, we introduce Path Integral Networks (PI-Net), a recurrent network representation of the Path Integral optimal control algorithm. The network includes both system dynamics and cost models, used for optimal control based planning. PI-Net is fully differentiable, learning both dynamics and cost models end-to-end by back-propagation and stochastic gradient descent. Because of this, PI-Net can learn to plan. PI-Net has several advantages: it can generalize to unseen states thanks to planning, it can be applied to continuous control tasks, and it allows for a wide variety learning schemes, including imitation and reinforcement learning. Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem. We also show that PI-Net is able to learn dynamics and cost models latent in the demonstrations.
“Value Prediction Network”, (2017-07-11):
This paper proposes a novel deep architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.(RL)
“Prediction and Control with Temporal Segment Models”, (2017-03-12):
We introduce a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions. Unlike dynamics models that operate over individual discrete timesteps, we learn the distribution over future state trajectories conditioned on past state, past action, and planned future action trajectories, as well as a prior can be used for end-to-end, fully trajectory optimization and model-based policy optimization, which we use to evaluate the performance and sample-efficiency of our method.prior over action trajectories. Our approach is based on convolutional autoregressive models and variational autoencoders. It makes stable and accurate predictions over long horizons for complex, stochastic systems, effectively expressing uncertainty and modeling the effects of collisions, sensory noise, and action delays. The learned dynamics model and action
Model-free deep models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3–5× on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbmfalgorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network
“Model-based Adversarial Imitation Learning”, (2016-12-07):
Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle D that discriminates between the expert’s data distribution and that of the generative model G. The generative model is trained to capture the expert’s distribution by maximizing the probability of D misclassifying the data it generates. Overall, the system is differentiable backpropagation.and is trained using basic
This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully , which enables us to train policies using the (stochastic) gradient of D. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune., which requires the use of high- gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning
We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.
We present a new approach to learning for planning, where knowledge acquired while solving a given set of planning problems is used to plan faster in related, but new problem instances. We show that a deep neural network can be used to learn and represent a generalized reactive policy (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances. In contrast to efforts in this direction, our approach significantly reduces the dependence of learning on handcrafted domain knowledge or feature selection. Instead, the GRP is trained from scratch using a set of successful execution traces. We show that our approach can also be used to automatically learn a heuristic function that can be used in directed search algorithms. We evaluate our approach using an extensive suite of experiments on two challenging planning problem domains and show that our approach facilitates learning complex decision making policies and powerful heuristic functions with minimal human input. Videos of our results are available at goo.gl/Hpy4e3.
“Deep Visual Foresight for Planning Robot Motion”, (2016-10-03):
A key challenge in scaling up robot learning to many skills and environments is removing the need for human supervision, so that robots can collect their own data and improve their own performance without being limited by the cost of requesting human feedback. Model-basedholds the promise of enabling an agent to learn to predict the effects of its actions, which could provide flexible predictive models for a wide range of tasks and environments, without detailed human supervision. We develop a method for combining deep action-conditioned video prediction models with model-predictive control that uses entirely unlabeled training data. Our approach does not require a calibrated camera, an instrumented training set-up, nor precise sensing and actuation. Our results show that our method enables a real robot to perform nonprehensile manipulation—pushing objects—and can handle novel objects not seen during training.
AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven’t yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human “in the loop” and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human’s intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent’s learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.
Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. The key contribution of this work is the design of a new search space (the “NASNet search space”) which enables transferability. In our experiments, we search for the best convolutional layer (or “cell”) on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named “NASNet architecture”. We also introduce a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models. On CIFAR-10 itself, NASNet achieves 2.4% error rate, which is state-of-the-art. On , NASNet achieves, among the published works, state-of-the-art accuracy of 82.7% top-1 and 96.2% top-5 on . Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS—a reduction of 28% in computational demand from the previous state-of-the-art model. When evaluated at different levels of computational cost, accuracies of NASNets exceed those of the state-of-the-art human-designed models. For instance, a small version of NASNet also achieves 74% top-1 accuracy, which is 3.1% better than equivalently-sized, state-of-the-art models for mobile platforms. Finally, the learned features by NASNet used with the Faster-RCNN framework surpass state-of-the-art by 4.0% achieving 43.1% mAP on the COCO dataset.
“Efficient Architecture Search by Network Transformation”, (2017-07-16):
Techniques for automatically designing deep neural network architectures such as on vast computational resources (e.g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4.23% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.based approaches have recently shown promising results. However, their success is based
“Net2Net: Accelerating Learning via Knowledge Transfer”, (2015-11-18):
We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net. During real-world workflows, one often trains very many different neural networks during the experimentation and design process. This is a wasteful process in which each new model is trained from scratch. Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network. Our techniques are based on the concept of function-preserving transformations between neural network specifications. This differs from previous approaches to pre-training that altered the function represented by a neural net when adding layers to it. Using our knowledge transfer mechanism to add depth to Inception modules, we demonstrate a new state of the art accuracy rating on thedataset.
“Noisy Networks for Exploration”, (2017-06-30):
We introduce NoisyNet, a deepagent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration.
The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead.
We find that replacing the conventional exploration heuristics for A3C, entropy reward and ε-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.and dueling agents (
This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.
“Distral: Robust Multitask Reinforcement Learning”, (2017-07-13):
Most deepalgorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a “distilled” policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable—attributes that are critical in deep .
“Generative Adversarial Imitation Learning”, (2016-06-10):
Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert’s cost function with inverse reinforcement learning, then extract a policy from that cost function with. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse . We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.
Rapid progress in deephas made it increasingly feasible to train controllers for high-dimensional humanoid bodies. However, methods that use pure reinforcement learning with simple reward functions tend to produce non-humanlike and overly stereotyped movement behaviors. In this work, we extend generative adversarial imitation learning to enable training of generic neural network policies to produce humanlike movement patterns from limited demonstrations consisting only of partially observed state features, without access to actions, even when the demonstrations come from a body with different and unknown physical parameters. We leverage this approach to build sub-skill policies from motion capture data and show that they can be reused to solve tasks when controlled by a higher level controller.
“Deep Q-learning from Demonstrations”, (2017-04-12):
Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator’s actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD 83 million steps to catch up to DQfD’s performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.(RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep
“Robust Physical-World Attacks on Deep Learning Models”, (2017-07-27):
Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations. Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms.
We propose a general attack algorithm, Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints.
Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. With a perturbation in the form of only black and white stickers, we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8% of the captured video frames obtained on a moving vehicle (field test) for the target classifier.
“A Simple Neural Attentive Meta-Learner”, (2017-07-11):
Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, many recent meta-learning approaches are extensively hand-designed, either using architectures specialized to a particular application, or hard-coding algorithmic components that constrain how the meta-learner solves the task. We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. In the most extensive set of meta-learning experiments to date, we evaluate the resulting Simple Neural AttentIve Learner (or SNAIL) on several heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins.
“Preprint Déjà Vu: an FAQ”, (2017-06-13):
I give a brief overview of arXiv history, and describe the current state of arXiv practice, both technical and sociological. This commentary originally appeared in the EMBO Journal, 19 Oct 2016. It was intended as an update on comments from the late 1990s regarding use of preprints by biologists (or lack thereof), but may be of interest to practitioners of other disciplines. It is based largely on a keynote presentation I gave to the ASAPbio inaugural meeting in Feb 2016, and responds as well to some follow-up questions.
Background: Despite substantial financial contributions by the United States President’s Malaria Initiative (PMI) since 2006, no studies have carefully assessed how this program may have affected important population-level health outcomes. We utilized multiple publicly available data sources to evaluate the association between introduction of PMI and child mortality rates in sub-Saharan Africa (SSA).
Methods and findings: We used difference-in-differences analyses to compare trends in the primary outcome of under-5 mortality rates and secondary outcomes reflecting population coverage of malaria interventions in 19 PMI-recipient and 13 non-recipient countries between 1995 and 2014. The analyses controlled for presence and intensity of other large funding sources, individual and household characteristics, and country and year fixed effects.
PMI program implementation was associated with a statistically-significant reduction in the annual risk of under-5 child mortality (adjusted risk ratio [RR] 0.84, 95% CI 0.74–0.96). Each dollar of per-capita PMI expenditures in a country, a measure of PMI intensity, was also associated with a reduction in child mortality (RR 0.86, 95% CI 0.78–0.93). We estimated that the under-5 mortality rate in PMI countries was reduced from 28.9 to 24.3 per 1,000 person-years. Population coverage of insecticide-treated nets increased by 8.34 percentage points (95% 0.86–15.83) and coverage of indoor residual spraying increased by 6.63 percentage points (95% 0.79–12.47) after PMI implementation. Per-capita PMI spending was also associated with a modest increase in artemisinin-based combination therapy coverage (3.56 percentage point increase, 95% CI −0.07–7.19), though this association was only marginally statistically-significant (p = 0.054). Our results were robust to several sensitivity analyses. Because our study design leaves open the possibility of unmeasured confounding, we cannot definitively interpret these results as causal.
Conclusions: PMI may have statistically-significantly contributed to reducing the burden of malaria in SSA and reducing the number of child deaths in the region. Introduction of PMI was associated with increased coverage of malaria prevention technologies, which are important mechanisms through which child mortality can be reduced. To our knowledge, this study is the first to assess the association between PMI and all-cause child mortality in SSA with the use of appropriate comparison groups and adjustments for regional trends in child mortality.
Why was this study done?
- Despite the considerable investment the US government has made in the President’s Malaria Initiative (PMI) since 2006, no studies to date have evaluated its association with population health outcomes.
- Previous evaluations have documented decreasing child mortality and increasing use of key malaria interventions in PMI-recipient countries. Our study sought to determine whether the trends in health outcomes in PMI-recipient countries differed statistically-significantly from the trends in these outcomes in PMI non-recipient countries in sub-Saharan Africa (SSA) over the past 2 decades.
What did the researchers do and find?
- We used a study design that leveraged multiple publicly available data sources from countries throughout SSA, spanning the years before and after PMI introduction, in order to estimate association between the introduction of PMI and child mortality rates.
- Our dataset included 7,752,071 child-year observations from 2,112,951 individual children who lived in 32 sub-Saharan countries, including all 19 PMI countries.
- We found that after adjusting for baseline differences between countries, overall time trends, other funding sources, and individual characteristics, PMI was associated with 16% annual risk reduction in child mortality and increased population coverage of key malaria prevention and treatment technologies.
- We tested the robustness of our results with a series of sensitivity analyses.
What do these findings mean?
- The study provides evidence that introduction of PMI was associated with reductions in child mortality in SSA, primarily through increased access to malaria prevention technologies.
- Evidence from this study can be used to inform policy decisions about future funding levels for malaria interventions.
- The interpretation of our study results rests on the assumption that there were no important unmeasured variables that differentially affected mortality rates in PMI and comparison countries during the study period.
Suicide is a major public health concern. High-dose lithium is used to stabilize mood and prevent suicide in patients with affective disorders. occurs naturally in drinking water worldwide in much lower doses, but with large geographical variation. Several studies conducted at an aggregate level have suggested an association between in drinking water and a reduced risk of suicide; however, a causal relation is uncertain.
Individual-level register-based data on the entire Danish adult population (3.7 million individuals) from 1991 to 2012 were linked with a moving 5-year time-weighted average (TWA) exposure level from drinking water hypothesizing an inverse relationship. The mean level was 11.6 μg/L ranging from 0.6 to 30.7 μg/L. The suicide rate decreased from 29.7 per 100,000 person-years at risk in 1991 to 18.4 per 100,000 person-years in 2012.
We found no between increasing 5-year TWA exposure level and decreasing suicide rate. The comprehensiveness of using individual-level data and spatial analyses with 22 years of follow-up makes a pronounced contribution to previous findings.indication of an association
Our findings demonstrate that there does not seem to be a protective effect of exposure toon the incidence of suicide with levels below 31 μg/L in drinking water.
[Keywords: drinking water,, suicide, individual-level data, spatial analysis, Denmark, exposure assessment]
1932-borges-thehomericversions.pdf: “The Homeric Versions”, (1932; ):
[6pg Borges essay on the literary merits of different translations of Homer and the problems of translation: the Newman-Arnold debate encapsulates the basic problem of literality vs literary. Borges gives translations of one passage by Buckley, Butcher & Lang, Cowper, Pope, Chapman, and Butler. Which is best? See also Borges 1936, “The Translators of the Thousand and One Nights”, a much more extended discussion of different translations of a work.]
1936-borges-thetranslatorsofthethousandandonenights.pdf: “The Translators of The Thousand and One Nights”, (1936; ):
[18pg Borges essay on translations of the collection of Arab fairytales The Thousand and One Nights: each translator—Galland, Lane, Burton, Littmann, Mardrus—criticized the previous translator by creation.]
At Trieste, in 1872, in a palace with damp statues and deficient hygienic facilities, a gentleman on whose face an African scar told its tale-Captain Richard Francis Burton, the English consul-embarked on a famous translation of the Quitab alif laila ua laila, which the roumis know by the title The Thousand and One Nights. One of the secret aims of his work was the annihilation of another gentleman (also weather-beaten, and with a dark and Moorish beard) who was compiling a vast dictionary in England and who died long before he was annihilated by Burton. That gentleman was Edward Lane, the Orientalist, author of a highly scrupulous version of The Thousand and One Nights that had supplanted a version by Galland. Lane translated against Galland, Burton against Lane; to understand Burton we must understand this hostile dynasty.
1951-borges-theargentinewriterandtradition.pdf: “The Argentine Writer and Tradition”, (1951; ):
Borges considers the problem of whether Argentinian writing on non-Argentinian subjects can still be truly “Argentine.” His conclusion: …We should not be alarmed and that we should feel that our patrimony is the universe; we should essay all themes, and we cannot limit ourselves to purely Argentine subjects in order to be Argentine; for either being Argentine is an inescapable act of fate—and in that case we shall be so in all events—or being Argentine is a mere affectation, a mask. I believe that if we surrender ourselves to that voluntary dream which is artistic creation, we shall be Argentine and we shall also be good or tolerable writers.