This is the July 2018 edition of the Gwern.net newsletter; previous, June 2018 (archives). This is a summary of the revision-history RSS feed, overlapping with my Changelog & /r/gwern/; brought to you by my donors on Patreon. This month I’d like to particularly thank my 2 anonymous donors, who gave $20,000.
Dactyl: “Learning Dexterous In-Hand Manipulation”, OpenAI 2018 (blog; videos: 1/2; PPO-LSTM+domain-randomization in MuJoCo/Unity for sim2real transfer in a robotic hand grasper—nothing really new here but it is yet another demonstration of what brute force + systems engineering can do with existing DRL.)
Three Parts Dead (Craft Sequence #1), Gladstone 2012 (urban fantasy, sold to me as being unusually legal and economic-themed; not bad, but those aspects turn out to be minimal, arguably less than, say, Spice & Wolf, and some parts are a little derivative—City of Doors, anyone? I was a little more intrigued by the overall worldbuilding, with a geopolitical order polarized between upstart immortal human mage-kings seizing power with magic-science, and retrenched old gods.)
The Japanese Family Storehouse, or, The Millionaire’s Gospel Modernised, Saikaku 1688, trans. Sargent (review)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
“Genetic analysis of social-class mobility in five longitudinal studies”, Daniel W. Belsky, Benjamin W. Domingue, Robbee Wedow, Louise Arseneault, Jason D. Boardman, Avshalom Caspi, Dalton Conley, Jason M. Fletcher, Jeremy Freese, Pamela Herd, Terrie E. Moffitt, Richie Poulton, Kamil Sicinski, Jasmin Wertz, Kathleen Mullan Harris (2018-07-31):
Genome-wide association study (GWAS) discoveries about educational attainment have raised questions about the meaning of the genetics of success. These discoveries could offer clues about biological mechanisms or, because children inherit genetics and social class from parents, education-linked genetics could be spurious correlates of socially transmitted advantages. To distinguish between these hypotheses, we studied social mobility in five cohorts from three countries. We found that people with more education-linked genetics were more successful compared with parents and siblings. We also found mothers’ education-linked genetics predicted their children’s attainment over and above the children’s own genetics, indicating an environmentally mediated genetic effect. Findings reject pure social-transmission explanations of education GWAS discoveries. Instead, genetics influences attainment directly through social mobility and indirectly through family environments.
A summary genetic measure, called a “polygenic score,” derived from a genome-wide association study (GWAS) of education can modestly predict a person’s educational and economic success. This prediction could signal a biological mechanism: Education-linked genetics could encode characteristics that help people get ahead in life. Alternatively, prediction could reflect social history: People from well-off families might stay well-off for social reasons, and these families might also look alike genetically. A key test to distinguish biological mechanism from social history is if people with higher education polygenic scores tend to climb the social ladder beyond their parents’ position. Upward mobility would indicate education-linked genetics encodes characteristics that foster success. We tested if education-linked polygenic scores predicted social mobility in >20,000 individuals in five longitudinal studies in the United States, Britain, and New Zealand. Participants with higher polygenic scores achieved more education and career success and accumulated more wealth. However, they also tended to come from better-off families. In the key test, participants with higher polygenic scores tended to be upwardly mobile compared with their parents. Moreover, in sibling-difference analysis, the sibling with the higher polygenic score was more upwardly mobile. Thus, education GWAS discoveries are not mere correlates of privilege; they influence social mobility within a life. Additional analyses revealed that a mother’s polygenic score predicted her child’s attainment over and above the child’s own polygenic score, suggesting parents’ genetics can also affect their children’s attainment through environmental pathways. Education GWAS discoveries affect socioeconomic attainment through influence on individuals’ family-of-origin environments and their social mobility. [Keywords: genetics, social class, social mobility, sociogenomics, polygenic score]
A previous genome-wide association study (GWAS) of more than 100,000 individuals identified molecular-genetic predictors of educational attainment. We undertook in-depth life-course investigation of the polygenic score derived from this GWAS using the four-decade Dunedin Study (N = 918). There were five main findings. First, polygenic scores predicted adult economic outcomes even after accounting for educational attainments. Second, genes and environments were correlated: Children with higher polygenic scores were born into better-off homes. Third, children’s polygenic scores predicted their adult outcomes even when analyses accounted for their social-class origins; social-mobility analysis showed that children with higher polygenic scores were more upwardly mobile than children with lower scores. Fourth, polygenic scores predicted behavior across the life course, from early acquisition of speech and reading skills through geographic mobility and mate choice and on to financial planning for retirement. Fifth, polygenic-score associations were mediated by psychological characteristics, including intelligence, self-control, and interpersonal skill. Effect sizes were small. Factors connecting GWAS sequence with life outcomes may provide targets for interventions to promote population-wide positive development. [Keywords: genetics, behavior genetics, intelligence, personality, adult development]
“Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals”, James J. Lee, Robbee Wedow, Aysu Okbay, Edward Kong, Omeed Maghzian, Meghan Zacher, Tuan Anh Nguyen-Viet, Peter Bowers, Julia Sidorenko, Richard Karlsson Linnér, Mark Alan Fontana, Tushar Kundu, Chanwook Lee, Hui Li, Ruoxi Li, Rebecca Royer, Pascal N. Timshel, Raymond K. Walters, Emily A. Willoughby, Loïc Yengo, 23andMe Research Team, COGENT (Cognitive Genomics Consortium), Social Science Genetic Association Consortium, Maris Alver, Yanchun Bao, David W. Clark, Felix R. Day, Nicholas A. Furlotte, Peter K. Joshi, Kathryn E. Kemper, Aaron Kleinman, Claudia Langenberg, Reedik Mägi, Joey W. Trampush, Shefali Setia Verma, Yang Wu, Max Lam, Jing Hua Zhao, Zhili Zheng, Jason D. Boardman, Harry Campbell, Jeremy Freese, Kathleen Mullan Harris, Caroline Hayward, Pamela Herd, Meena Kumari, Todd Lencz, Jian’an Luan, Anil K. Malhotra, Andres Metspalu, Lili Milani, Ken K. Ong, John R. B. Perry, David J. Porteous, Marylyn D. Ritchie, Melissa C. Smart, Blair H. Smith, Joyce Y. Tung, Nicholas J. Wareham, James F. Wilson, Jonathan P. Beauchamp, Dalton C. Conley, Tõnu Esko, Steven F. Lehrer, Patrik K. E. Magnusson, Sven Oskarsson, Tune H. Pers, Matthew R. Robinson, Kevin Thom, Chelsea Watson, Christopher F. Chabris, Michelle N. Meyer, David I. Laibson, Jian Yang, Magnus Johannesson, Philipp D. Koellinger, Patrick Turley, Peter M. Visscher, Daniel J. Benjamin, David Cesarini (10.1038/s41588-018-0147-3):
Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
Recent advances in genomics are producing powerful DNA predictors of complex traits, especially cognitive abilities. Here, we leveraged summary statistics from the most recent genome-wide association studies of intelligence and educational attainment to build prediction models of general cognitive ability and educational achievement. To this end, we compared the performances of multi-trait genomic and polygenic scoring methods. In a representative UK sample of 7,026 children at age 12 and 16, we show that we can now predict up to 11 percent of the variance in intelligence and 16 percent in educational achievement. We also show that predictive power increases from age 12 to age 16 and that genomic predictions do not differ for girls and boys. Multivariate genomic methods were effective in boosting predictive power and, even though prediction accuracy varied across polygenic scores approaches, results were similar using different multivariate and polygenic score methods. Polygenic scores for educational attainment and intelligence are the most powerful predictors in the behavioural sciences and exceed predictions that can be made from parental phenotypes such as educational attainment and occupational status.
Research on environmental and genetic pathways to complex traits such as educational attainment (EA) is confounded by uncertainty over whether correlations reflect effects of transmitted parental genes, causal family environments, or some, possibly interactive, mixture of both. Thus, an aggregate of thousands of alleles associated with EA (a polygenic risk score; PRS) may tap parental behaviors and home environments promoting EA in the offspring. New methods for unpicking and determining these causal pathways are required. Here, we utilize the fact that parents pass, at random, 50% of their genome to a given offspring to create independent scores for the transmitted alleles (conventional EA PRS) and a parental score based on alleles not transmitted to the offspring (EA VP_PRS). The formal effect of non-transmitted alleles on offspring attainment was tested in 2,333 genotyped twins for whom high-quality measures of EA, assessed at age 17 years, were available, and whose parents were also genotyped. Four key findings were observed. First, the EA PRS and EA VP_PRS were empirically independent, validating the virtual-parent design. Second, in this family-based design, children’s own EA PRS significantly predicted their EA (β = 0.15), ruling out stratification confounds as a cause of the association of attainment with the EA PRS. Third, parental EA PRS predicted the SES environment parents provided to offspring (β = 0.20), and parental SES and offspring EA were significantly associated (β = 0.33). This would suggest that the EA PRS is at least as strongly linked to social competence as it is to EA, leading to higher attained SES in parents and, therefore, a higher experienced SES for children. In a full structural equation model taking account of family genetic relatedness across multiple siblings the non-transmitted allele effects were estimated at similar values; but, in this more complex model, confidence intervals included zero. A test using the forthcoming EA3 PRS may clarify this outcome. The virtual-parent method may be applied to clarify causality in other phenotypes where observational evidence suggests parenting may moderate expression of other outcomes, for instance in psychiatry.
Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a new method for polygenic prediction, LDpred-funct, that leverages trait-specific functional enrichments to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, which includes coding, conserved, regulatory and LD-related anno-tations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. LDpred-funct attained higher prediction accuracy than other polygenic prediction methods in simulations using real genotypes. We applied LDpred-funct to predict 16 highly heritable traits in the UK Biobank. We used association statistics from British-ancestry samples as training data (avg JV=365K) and samples of other European ancestries as validation data (avg 7V=22K), to minimize confounding. LDpred-funct attained a +27% relative improvement in prediction accuracy (avg prediction R2=0.173; highest R2=0.417 for height) compared to existing methods that do not incorporate functional information, consistent with simulations. For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (total iV = 1107K; higher heritability in UK Biobank cohort) increased prediction R2 to 0.429. Our results show that modeling functional enrichment substantially improves polygenic prediction accuracy, bringing polygenic prediction of complex traits closer to clinical utility.
Background: There is now convincing evidence that pleiotropy across the genome contributes to the correlation between human traits and comorbidity of diseases. The recent availability of genome-wide association study (GWAS) results have made the polygenic risk score (PRS) approach a powerful way to perform genetic prediction and identify genetic overlap among phenotypes.
Methods and findings
Here we use the PRS method to assess evidence for shared genetic aetiology across hundreds of traits within a single epidemiological study – the Northern Finland Birth Cohort 1966 (NFBC1966). We replicate numerous recent findings, such as a genetic association between Alzheimer’s disease and lipid levels, while the depth of phenotyping in the NFBC1966 highlights a range of novel significant genetic associations between traits.
Conclusion: This study illustrates the power in taking a hypothesis-free approach to the study of shared genetic aetiology between human traits and diseases. It also demonstrates the potential of the PRS method to provide important biological insights using only a single well-phenotyped epidemiological study of moderate sample size (~5k), with important advantages over evaluating genetic correlations from GWAS summary statistics only.
We use a multi-stage genome-wide association of 1 million parental lifespans of genotyped subjects and data on mortality risk factors to validate previously unreplicated findings near CDKN2B-AS1, ATXN2/BRAP, FURIN/FES, ZW10, PSORS1C3, and 13q21.31, and identify and replicate novel findings near GADD45G, KCNK3, LDLR, POM121C, ZC3HC1, and ABO. We also validate previous findings near 5q33.3/EBF1 and FOXO3, whilst finding contradictory evidence at other loci. Gene set and tissue-specific analyses show that expression in foetal brain cells and adult dorsolateral prefrontal cortex is enriched for lifespan variation, as are gene pathways involving lipid proteins and homeostasis, vesicle-mediated transport, and synaptic function. Individual genetic variants that increase dementia, cardiovascular disease, and lung cancer –but not other cancers-explain the most variance, possibly reflecting modern susceptibilities, whilst cancer may act through many rare variants, or the environment. Resultant polygenic scores predict a mean lifespan difference of around five years of life across the deciles.
Researchers have identified environmental risks that predict subsequent psychological and medical problems. Based on these correlational findings, researchers have developed and tested complex developmental models and have examined biological moderating factors (e.g., gene-environment interactions). In this context, we stress the critical need for researchers to use family-based, quasi-experimental designs when trying to integrate genetic and social science research involving environmental variables because these designs rigorously examine causal inferences by testing competing hypotheses. We argue that sibling comparison, offspring of twins or siblings, in vitro fertilization designs, and other genetically informed approaches play a unique role in bridging gaps between basic biological and social science research. We use studies on maternal smoking during pregnancy to exemplify these principles.
“Genetics & the Geography of Health, Behavior, and Attainment”, Daniel W. Belsky, Avshalom Caspi, Louise Arseneault, David L. Corcoran, Benjamin W. Domingue, Kathleen Mullan Harris, Renate Houts, Jonathan Mill, Terrie E. Moffitt, Joseph Prinz, Karen Sugden, Jasmin Wertz, Benjamin Williams, Candice Odgers (2018-07-25):
People’s life chances can be predicted by their neighborhoods. This observation is driving efforts to improve lives by changing neighborhoods. Some neighborhood effects may be causal, supporting neighborhood-level interventions. Other neighborhood effects may reflect selection of families with different characteristics into different neighborhoods, supporting interventions that target families/individuals directly. To test how selection affects different neighborhood-linked problems, we linked neighborhood data with genetic, health, and social-outcome data for >7,000 European-descent UK and US young people in the E-Risk and Add Health Studies. We tested selection/concentration of genetic risks for obesity, schizophrenia, teen-pregnancy, and poor educational outcomes in high-risk neighborhoods, including genetic analysis of neighborhood mobility. Findings argue against genetic selection/concentration as an explanation for neighborhood gradients in obesity and mental-health problems, suggesting neighborhoods may be causal. In contrast, modest genetic selection/concentration was evident for teen-pregnancy and poor educational outcomes, suggesting neighborhood effects for these outcomes should be interpreted with care.
Genomic selection has revolutionized dairy cattle breeding. Since 2000, assays have been developed to genotype large numbers of single-nucleotide polymorphisms (SNPs) at relatively low cost. The first commercial SNP genotyping chip was released with a set of 54,001 SNPs in December 2007. Over 15,000 genotypes were used to determine which SNPs should be used in genomic evaluation of US dairy cattle. Official USDA genomic evaluations were first released in January 2009 for Holsteins and Jerseys, in August 2009 for Brown Swiss, in April 2013 for Ayrshires, and in April 2016 for Guernseys. Producers have accepted genomic evaluations as accurate indications of a bull's eventual daughter-based evaluation. The integration of DNA marker technology and genomics into the traditional evaluation system has doubled the rate of genetic progress for traits of economic importance, decreased generation interval, increased selection accuracy, reduced previous costs of progeny testing, and allowed identification of recessive lethals. [Keywords: genetic evaluation, single-nucleotide polymorphism, SNP, reliability, imputation, haplotype, genotype]
“Learning Dexterous In-Hand Manipulation”, OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba (2018-08-01):
We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object’s appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM
“Large-Scale Visual Speech Recognition”, Brendan Shillingford, Yannis Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas (2018-07-13):
This work presents a scalable solution to open-vocabulary visual speech recognition. To achieve this, we constructed the largest existing visual speech recognition dataset, consisting of pairs of text and video clips of faces speaking (3,886 hours of video). In tandem, we designed and trained an integrated lipreading system, consisting of a video processing pipeline that maps raw video to stable videos of lips and sequences of phonemes, a scalable deep neural network that maps the lip videos to sequences of phoneme distributions, and a production-level speech decoder that outputs sequences of words. The proposed system achieves a word error rate (WER) of 40.9 measured on a held-out set. In comparison, professional lipreaders achieve either 86.4 types of contextual information. Our approach significantly improves on other lipreading approaches, including variants of LipNet and of Watch, Attend, and Spell (WAS), which are only capable of 89.8
Deep neural networks are susceptible to adversarial attacks. In computer vision, well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduce attacks that instead reprogram the target model to perform a task chosen by the attacker—without the attacker needing to specify or compute the desired output for each test-time input. This attack finds a single adversarial perturbation, that can be added to all test-time inputs to a machine learning model in order to cause the model to perform a task chosen by the adversary—even if the model was not trained to do this task. These perturbations can thus be considered a program for the new task. We demonstrate adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model.
‘Computers’, in the sense of being Turing-complete, are extremely common. Almost any system of sufficient complexity—unless carefully engineered otherwise—may be found to ‘accidentally’ support Turing-complete somewhere inside it, even systems which would appear to have not the slightest thing to do with computation. Software systems are especially susceptible to this, which often leads to serious security problems as the Turing-complete components can be used to run attacks on the rest of the system.
I provide a running catalogue of systems which have been, surprisingly, demonstrated to be Turing-complete.
Realistic music generation is a challenging task. When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so in this work we embark on modelling music in the raw audio domain. It has been shown that autoregressive models excel at generating raw audio waveforms of speech, but when applied to music, we find them biased towards capturing local signal structure at the expense of modelling long-range correlations. This is problematic because music exhibits structure at many different timescales. In this work, we explore autoregressive discrete autoencoders (ADAs) as a means to enable autoregressive models to capture long-range correlations in waveforms. We find that they allow us to unconditionally generate piano music directly in the raw audio domain, which shows stylistic consistency across tens of seconds.
Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g., 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task).
Model-free reinforcement learning (RL) methods are succeeding in a growing number of tasks, aided by recent advances in deep learning. However, they tend to suffer from high sample complexity, which hinders their use in real-world domains. Alternatively, model-based reinforcement learning promises to reduce sample complexity, but tends to require careful tuning and to date have succeeded mainly in restrictive domains where simple models are sufficient for learning. In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training. To overcome this issue, we propose to use an ensemble of models to maintain the model uncertainty and regularize the learning process. We further show that the use of likelihood ratio derivatives yields much more stable learning than backpropagation through time. Altogether, our approach Model-Ensemble Trust-Region Policy Optimization (ME-TRPO) significantly reduces the sample complexity compared to model-free deep RL methods on challenging continuous control benchmark tasks.
While many recent advances in deep reinforcement learning (RL) rely on model-free methods, model-based approaches remain an alluring prospect for their potential to exploit unsupervised data to learn environment model. In this work, we provide an extensive study on the design of deep generative models for RL environments and propose a sample efficient and robust method to learn the model of Atari environments. We deploy this model and propose generative adversarial tree search (GATS) a deep RL algorithm that learns the environment model and implements Monte Carlo tree search (MCTS) on the learned model for planning. While MCTS on the learned model is computationally expensive, similar to AlphaGo, GATS follows depth limited MCTS. GATS employs deep Q network (DQN) and learns a Q-function to assign values to the leaves of the tree in MCTS. We theoretical analyze GATS vis-a-vis the bias-variance trade-off and show GATS is able to mitigate the worst-case error in the Q-estimate. While we were expecting GATS to enjoy a better sample complexity and faster converges to better policies, surprisingly, GATS fails to outperform DQN. We provide a study on which we show why depth limited MCTS fails to perform desirably.
Learning to control robots directly based on images is a primary challenge in robotics. However, many existing reinforcement learning approaches require iteratively obtaining millions of robot samples to learn a policy, which can take significant time. In this paper, we focus on learning a realistic world model capturing the dynamics of scene changes conditioned on robot actions. Our dreaming model can emulate samples equivalent to a sequence of images from the actual environment, technically by learning an action-conditioned future representation/scene regressor. This allows the agent to learn action policies (i.e., visuomotor policies) by interacting with the dreaming model rather than the real-world. We experimentally confirm that our dreaming model enables robot learning of policies that transfer to the real-world.
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when rewards are delayed. RUDDER aims at making the expected future rewards zero, which simplifies Q-value estimation to computing the mean of the immediate reward. We propose the following two new concepts to push the expected future rewards toward zero. (i) Reward redistribution that leads to return-equivalent decision processes with the same optimal policies and, when optimal, zero expected future rewards. (ii) Return decomposition via contribution analysis which transforms the reinforcement learning task into a regression task at which deep learning excels. On artificial tasks with delayed rewards, RUDDER is significantly faster than MC and exponentially faster than Monte Carlo Tree Search (MCTS), TD(), and reward shaping approaches. At Atari games, RUDDER on top of a Proximal Policy Optimization (PPO) baseline improves the scores, which is most prominent at games with delayed rewards. Source code is available at https://github.com/ml-jku/rudder and demonstration videos at https://goo.gl/EQerZV.
State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions and decision-making. At the same time, RNNs can be hard to specify and train for non-experts. Training RNNs is notoriously tricky, particularly as the common strategy to approximate gradients back in time, called truncated Back-prop Through Time (BPTT), can be sensitive to the truncation window. Further, domain-expertise—which can usually help constrain the function class and so improve trainability—can be difficult to incorporate into complex recurrent units used within RNNs. In this work, we explore how to use multi-step predictions, as a simple and general approach to inject prior knowledge, while retaining much of the generality and learning power behind RNNs. In particular, we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture, called a General Value Function Network (GVFN), where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs, and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level, in many cases only requiring one-step gradient updates.
We present an adversarial active exploration for inverse dynamics model learning, a simple yet effective learning scheme that incentivizes exploration in an environment without any human intervention. Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other. The former collects training samples for the latter, with an objective to maximize the error of the latter. The latter is trained with samples collected by the former, and generates rewards for the former when it fails to predict the actual action taken by the former. In such a competitive setting, the DRL agent learns to generate samples that the inverse dynamics model fails to predict correctly, while the inverse dynamics model learns to adapt to the challenging samples. We further propose a reward structure that ensures the DRL agent to collect only moderately hard samples but not overly hard ones that prevent the inverse model from predicting effectively. We evaluate the effectiveness of our method on several robotic arm and hand manipulation tasks against multiple baseline models. Experimental results show that our method is comparable to those directly trained with expert demonstrations, and superior to the other baselines even without any human priors.
“The Sound of Pixels”, Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba (2018-04-09):
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel. Our approach capitalizes on the natural synchronization of the visual and audio modalities to learn models that jointly parse sounds and images, without requiring additional manual supervision. Experimental results on a newly collected MUSIC dataset show that our proposed Mix-and-Separate framework outperforms several baselines on source separation. Qualitative results suggest our model learns to ground sounds in vision, enabling applications such as independently adjusting the volume of sound sources.
On the general topic of animal model external validity & translation to humans, a number of op-eds, reviews, and meta-analyses have been done; reading through some of the literature up to March 2013, I would summarize them as indicating that the animal research literature in general is of considerably lower quality than human research, and that for those and intrinsic biological reasons, the probability of meaningful transfer from animal to human can be astoundingly low, far below 50% and in some categories of results, 0%.
The primary reasons identified for this poor performance are generally: small samples (much smaller than the already underpowered norms in human research), lack of blinding in taking measurements, pseudo-replication due to animals being correlated by genetic relatedness/living in same cage/same room/same lab, extensive non-normality in data, large differences between labs due to local differences in reagents/procedures/personnel illustrating the importance of “tacit knowledge”, publication bias (small cheap samples + little perceived ethical need to publish + no preregistration norms), unnatural & unnaturally easy lab environments (more naturalistic environments both offer more realistic measurements & challenge animals), large genetic differences due to inbreeding/engineering/drift of lab strains mean the same treatment can produce dramatically different results in different strains (or sexes) of the same species, different species can have different responses, and none of them may be like humans in the relevant biological way in the first place.
So it is no wonder that “we can cure cancer in mice but not people” and almost all amazing breakthroughs in animals never make it to human practice; medicine & biology are difficult.
Problems with social experiments and evaluating them, loopholes, causes, and suggestions; non-experimental methods systematically deliver false results, as most interventions fail or have small effects.
Dasatinib, sold under the brand name Sprycel among others, is a targeted therapy used to treat certain cases of chronic myelogenous leukemia (CML) and acute lymphoblastic leukemia (ALL). Specifically it is used to treat cases that are Philadelphia chromosome-positive (Ph+). It is taken by mouth.
Quercetin is a plant flavonol from the flavonoid group of polyphenols. It is found in many fruits, vegetables, leaves, seeds, and grains; red onions and kale are common foods containing appreciable amounts of quercetin. Quercetin has a bitter flavor and is used as an ingredient in dietary supplements, beverages, and foods.
Information from many large data bases and published studies was integrated to estimate the age-specific spontaneous abortion rate in an economically-developed human population. Accuracy was tested with published data from a diverse array of studies. Spontaneous abortion was found to be: i) the predominant outcome of fertilization and ii) a natural and inevitable part of human reproduction at all ages. The decision to reproduce is inextricably coupled with the production of spontaneous abortions with high probability, and the decision to have a large family leads to many spontaneous abortions with virtual certainty. The lifetime number of spontaneous abortions was estimated for a “canonical” woman (constrained to have average age at marriage, first birth, inter-birth intervals, and family size) in two populations: one with and the other without effective birth control (including free access to elective abortions). Birth control was found to reduce lifetime abortions more than 6-fold.
The Study of Mathematically Precocious Youth (SMPY) is a prospective longitudinal survey study of persons identified by scores of 700 or higher on a section of the SAT Reasoning Test before age 13 years. It is one of the longest-running longitudinal studies of gifted youth in world history. Study scholars have used survey data from study participants to advance hypotheses about talent development and occupational preferences.
Spice and Wolf is a Japanese light novel series written by Isuna Hasekura, with illustrations by Jū Ayakura. ASCII Media Works has published 22 novels since February 2006 under their Dengeki Bunko imprint. ASCII Media Works reported that as of October 2008, over 2.2 million copies of the first nine novels have been sold in Japan. The series has been called a "unique fantasy" by Mainichi Shimbun due to the plot focusing on economics, trade, and peddling rather than the typical staples of fantasy such as swords and magic. Yen Press licensed the light novels and is releasing them in English in North America. ASCII Media Works has published three volumes of a spin-off light novel series titled Wolf and Parchment since September 2016.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.