“The Long-Term Effects of Cash Assistance”, Price & Song 2016 (An unusually long-term followup to one of the old American basic income experiments. No large harmful effects… but no large benefits either, nothing remotely like we observe in the Third World BI/transfer experiments. And if money helps with social outcomes, it should’ve helped more in the 1970s than it does now, so I feel more pessimistic about Give Directly/YC’s USA BI experiment.)
“Uncertainty in Deep Learning”, Gal 2016 (using dropout to turn NNs into ensembles of Bayesian NNs, allowing extraction of posterior distributions and thus uncertainty of outputs, which helps active learning & reinforcement learning)
You can see this as a way to most economically increase your dataset size by only labeling the most valuable instances; it can also be used to improve dataset quality by targeting instead the errors a model makes on a noisy corpus for examination by the oracle; and finally, it can be seen as a demonstration of the advantages of reinforcement learning over simple supervised learning over heaps of data—given the curse of dimensionality, most data is useless for training a model because it is so redundant and already a solved problem, and the data the model needs to improve its performance is a needle in a haystack. So by giving a model RL capabilities, you improve its supervised/inference performance! The way I put this: “tool AIs want to be agent AIs”.
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
In multivariate quantitative genetics, a genetic correlation is the proportion of variance that two traits share due to genetic causes, the correlation between the genetic influences on a trait and the genetic influences on a different trait estimating the degree of pleiotropy or causal overlap. A genetic correlation of 0 implies that the genetic effects on one trait are independent of the other, while a correlation of 1 implies that all of the genetic influences on the two traits are identical. The bivariate genetic correlation can be generalized to inferring genetic latent variable factors across > 2 traits using factor analysis. Genetic correlation models were introduced into behavioral genetics in the 1970s–1980s.
It is possible that heritable variance in personality characteristics does not reflect (only) genetic and biological processes specific to personality per se. We tested the possibility that Five-Factor Model personality domains and facets, as rated by people themselves and their knowledgeable informants, reflect polygenic influences that have been previously associated with educational attainment. In a sample of over 3,000 adult Estonians, polygenic scores for educational attainment, based on small contributions from more than 150,000 genetic variants, were correlated with various personality traits, mostly from the Neuroticism and Openness domains. The correlations of personality characteristics with educational attainment-related polygenic influences reflected almost entirely their correlations with phenotypic educational attainment. Structural equation modeling of the associations between polygenic risk, personality (a weighed aggregate of education-related facets) and educational attainment lent relatively strongest support to the possibility of educational attainment mediating (explaining) some of the heritable variance in personality traits.
Empathy is the drive to identify the mental states of others and respond to these with an appropriate emotion. Systemizing is the drive to analyse or build lawful systems. Difficulties in empathy have been identified in different psychiatric conditions including autism and schizophrenia. In this study, we conducted genome-wide association studies of empathy and systemizing using the Empathy Quotient (EQ) (n = 46,861) and the Systemizing Quotient-Revised (SQ-R) (n = 51,564) in participants from 23andMe, Inc. We confirmed significant sex-differences in performance on both tasks, with a male advantage on the SQ-R and female advantage on the EQ. We found highly significant heritability explained by single nucleotide polymorphisms (SNPs) for both the traits (EQ: 0.11±0.014; p = 1.7 × 10-14 and SQ-R: 0.12±0.012; p = 1.2 × 10-20) and these were similar for males and females. However, genes with higher expression in the male brain appear to contribute to the male advantage for the SQ-R. Finally, we identified significant genetic correlations between high score for empathy and risk for schizophrenia (p = 2.5 × 10-5), and correlations between high score for systemizing and higher educational attainment (p = 5 × 10-4). These results shed light on the genetic contribution to individual differences in empathy and systemizing, two major cognitive functions of the human brain.
We conducted a genome-wide meta-analysis of cognitive empathy using the ‘Reading the Mind in the Eyes’ Test (Eyes Test) in 88,056 Caucasian research participants (44,574 females and 43,482 males) from 23andMe Inc., and an additional 1,497 Caucasian participants (891 females and 606 males) from the Brisbane Longitudinal Twin Study (BLTS). We confirmed a female advantage on the Eyes Test (Cohen’s d = 0.21, p &;t; 0.001), and identified a locus in 3p26.1 that is associated with scores on the Eyes Test in females (rs7641347, pmeta = 1.57 × 10−8). Common single nucleotide polymorphisms (SNPs) explained 20% of the twin heritability and 5.6% (±0.76; p = 1.72 × 10−13) of the total trait variance in both sexes. Finally, we identified significant genetic correlation between the Eyes Test and measures of empathy (the Empathy Quotient), openness (NEO-Five Factor Inventory), and different measures of educational attainment and cognitive aptitude, and show that the genetic determinants of striatal volumes (caudate nucleus, putamen, and nucleus accumbens) are positively correlated with the genetic determinants of performance on the Eyes Test.
Neural networks augmented with external memory have the ability to learn algorithmic solutions to complex tasks. These models appear promising for applications such as language modeling and machine translation. However, they scale poorly in both space and time as the amount of memory grows — limiting their applicability to real-world domains. Here, we present an end-to-end differentiable memory access scheme, which we call Sparse Access Memory (SAM), that retains the representational power of the original approaches whilst training efficiently with very large memories. We show that SAM achieves asymptotic lower bounds in space and time complexity, and find that an implementation runs 1,000× faster and with 3,000× less physical memory than non-sparse models. SAM learns with comparable data efficiency to existing models on a range of synthetic tasks and one-shot Omniglot character recognition, and can scale to tasks requiring 100,000s of time steps and memories. As well, we show how our approach can be adapted for models that maintain temporal associations between memories, as with the recently introduced Differentiable Neural Computer.
Conversational speech recognition has served as a flagship speech recognition task since the release of the Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. The error rate of professional transcribers is 5.9 which newly acquainted pairs of people discuss an assigned topic, and 11.3 the CallHome portion where friends and family members have open-ended conversations. In both cases, our automated system establishes a new state of the art, and edges past the human benchmark, achieving error rates of 5.8 11.0 convolutional and LSTM acoustic model architectures, combined with a novel spatial smoothing method and lattice-free MMI acoustic training, multiple recurrent neural network language modeling approaches, and a systematic use of system combination.
“Video Pixel Networks”, Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu (2016-10-03):
We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.
Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using progressive networks to bridge the reality gap and transfer learned policies from simulation to the real world. The progressive net approach is a general framework that enables reuse of everything from low-level visual features to high-level policies for transfer to new tasks, enabling a compositional, yet simple, approach to building complex skills. We present an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging the reality gap. Unlike other proposed approaches, our real-world experiments demonstrate successful task learning from raw visual input on a fully actuated robot manipulator. Moreover, rather than relying on model-based trajectory optimisation, the task learning is accomplished using only deep reinforcement learning and sparse rewards.
Yahoo’s recently open sourced neural network, open_nsfw, is a fine tuned Residual Network which scores images on a scale of 0 to 1 on its suitability for use in the workplace…What makes an image NSFW, according to Yahoo? I explore this question with a clever new visualization technique by Nguyen et al…Like Google’s Deep Dream, this visualization trick works by maximally activating certain neurons of the classifier. Unlike deep dream, we optimize these activations by performing descent on a parameterization of the manifold of natural images.
[Demonstration of an unusual use of backpropagation to ‘optimize’ a neural network: instead of taking a piece of data to input to a neural network and then updating the neural network to change its output slightly towards some desired output (such as a correct classification), one can instead update the input so as to make the neural net output slightly more towards the desired output. When using a image classification neural network, this reversed form of optimization will ‘hallucinate’ or ‘edit’ the ‘input’ to make it more like a particular class of images. In this case, a porn/NSFW-detecting NN is reversed so as to make images more (or less) “porn-like”. Goh runs this process on various images like landscapes, musical bands, or empty images; the maximally/minimally porn-like images are disturbing, hilarious, and undeniably pornographic in some sense.]
Autonomous AI systems (Agent AIs) trained using reinforcement learning can do harm when they take wrong actions, especially superintelligent Agent AIs. One solution would be to eliminate their agency by not giving AIs the ability to take actions, confining them to purely informational or inferential tasks such as classification or prediction (Tool AIs), and have all actions be approved & executed by humans, giving equivalently superintelligent results without the risk.
I argue that this is not an effective solution for two major reasons. First, because Agent AIs will by definition be better at actions than Tool AIs, giving an economic advantage. Secondly, because Agent AIs will be better at inference & learning than Tool AIs, and this is inherently due to their greater agency: the same algorithms which learn how to perform actions can be used to select important datapoints to learn inference over, how long to learn, how to more efficiently execute inference, how to design themselves, how to optimize hyperparameters, how to make use of external resources such as long-term memories or external software or large databases or the Internet, and how best to acquire new data. All of these actions will result in Agent AIs more intelligent than Tool AIs, in addition to their greater economic competitiveness. Thus, Tool AIs will be inferior to Agent AIs in both actions and intelligence, implying use of Tool AIs is a even more highly unstable equilibrium than previously argued, as users of Agent AIs will be able to outcompete them on two dimensions (and not just one).
The overall goal of this work was to measure the efficacy of fMRI for predicting whether a dog would be a successful service dog. The training and imaging were performed in 50 dogs entering advanced training at 17-21 months of age. FMRI responses were measured while each dog observed hand signals indicating either reward or no reward and given by both a familiar handler and a stranger. 49 dogs successfully completed fMRI training and scanning. Of these, 33 eventually completed service training and were matched with a person, while 10 were released for behavioral reasons. Using anatomically defined regions-of-interest in the ventral caudate, amygdala, and visual cortex, we developed a classifier based on the dogs9 outcomes. We found that responses in the stranger condition were sufficient to develop an accurate brain-based classifier. On all data, the classifier had a positive predictive value of 96% with 10% false positives. The area under the receiver operating characteristic curve was 0.90 (0.79 with 4-fold cross-validation, p = 0.02), indicating a significant diagnostic capability. Within the stranger condition, the differential response to [reward – no reward] in ventral caudate was positively correlated with a successful outcome, while the differential response in the amygdala was negatively correlated to outcome. These results show that successful service dogs transfer knowledge to strangers as indexed by ventral caudate activity without excessive arousal as measured in the amygdala.
Montaillou is a book by the French historian Emmanuel Le Roy Ladurie first published in 1975. It was first translated into English in 1978 by Barbara Bray, and has been subtitled The Promised Land of Error and Cathars and Catholics in a French Village.
Emmanuel Bernard Le Roy Ladurie is a French historian whose work is mainly focused upon Languedoc in the Ancien Régime, particularly the history of the peasantry. One of the leading historians of France, Le Roy Ladurie has been called the "standard-bearer" of the third generation of the Annales school and the "rock star of the medievalists", noted for his work in social history.
Confessions of an English Opium-Eater (1821) is an autobiographical account written by Thomas De Quincey, about his laudanum addiction and its effect on his life. The Confessions was "the first major work De Quincey published and the one which won him fame almost overnight..."
The Tales of Ise is a Japanese uta monogatari, or collection of waka poems and associated narratives, dating from the Heian period. The current version collects 125 sections, with each combining poems and prose, giving a total of 209 poems in most versions.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.