“Speech2Face: Learning the Face Behind a Voice”, (2019-05-23):
How much can we infer about a person’s looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform this task using millions of natural Internet/YouTube videos of people speaking. During training, our model learns voice-face correlations that allow it to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity. This is done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. We evaluate and numerically quantify how–and in what manner–our Speech2Face reconstructions, obtained directly from audio, resemble the true face images of the speakers.
There is ample evidence that a human face provides signals of human personality and behaviour. Previous studies have found associations between the features of artificial composite facial images and attributions of personality traits by human experts. We present new findings demonstrating the statistically-significant prediction of a wider set of personality features (all the Big Five personality traits) for both men and women using real-life static facial images. Volunteer participants (N = 12,447) provided their face photographs (31,367 images) and completed a self-report measure of the Big Five traits. We trained a cascade of artificial neural networks (ANNs) on a large labeled dataset to predict self-reported Big Five scores. The highest correlations were found for conscientiousness (0.360 for men and 0.335 for women), exceeding the results obtained in prior studies. The findings provide strong support for the hypothesis that it is possible to predict multidimensional personality profiles from static facial images using ANNs trained on large labeled datasets.
While Generative Adversarial Networks (GANs) have seen huge successes in image synthesis tasks, they are notoriously difficult to adapt to different datasets, in part due to instability during training and sensitivity to hyperparameters. One commonly accepted reason for this instability is that gradients passing from the discriminator to the generator become uninformative when there isn’t enough overlap in the supports of the real and fake distributions. In this work, we propose the Multi-Scale Gradient Generative Adversarial Network (MSG- ), a simple but effective technique for addressing this by allowing the flow of gradients from the discriminator to the generator at multiple scales. This technique provides a stable approach for high resolution image synthesis, and serves as an alternative to the commonly used progressive growing technique. We show that MSG- converges stably on a variety of image datasets of different sizes, resolutions and domains, as well as different types of loss functions and architectures, all with the same set of fixed hyperparameters. When compared to state-of-the-art , our approach matches or exceeds the performance in most of the cases we tried.
“mixup: Beyond Empirical Risk Minimization”, (2017-10-25):
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
“Sem-GAN: Semantically-Consistent Image-to-Image Translation”, (2018-07-12):
Unpaired image-to-image translation is the problem of mapping an image in the source domain to one in the target domain, without requiring corresponding image pairs. To ensure the translated images are realistically plausible, recent works, such as Cycle-image segmentation task. This is because, invertibility does not necessarily enforce semantic correctness. To this end, we present a semantically-consistent framework, dubbed Sem- , in which the semantics are defined by the class identities of image segments in the source domain as produced by a semantic segmentation algorithm. Our proposed framework includes consistency constraints on the translation task that, together with the loss and the cycle-constraints, enforces that the images when translated will inherit the appearances of the target domain, while (approximately) maintaining their identities from the source domain. We present experiments on several image-to-image translation tasks and demonstrate that Sem-GAN improves the quality of the translated images significantly, sometimes by more than 20% on the FCN score. Further, we show that semantic segmentation models, trained with synthetic images translated via Sem- , leads to significantly better segmentation results than other variants., demands this mapping to be invertible. While, this requirement demonstrates promising results when the domains are unimodal, its performance is unpredictable in a multi-modal scenario such as in an
Unsupervised image-to-image translation techniques are able to map local texture between two domains, but they are typically unsuccessful when the domains require larger shape change. Inspired by semantic segmentation, we introduce a discriminator with dilated convolutions that is able to use information from across the entire image to train a more context-aware generator. This is coupled with a multi-scale perceptual loss that is better able to represent error in the underlying shape of objects. We demonstrate that this design is more capable of representing shape deformation in a challenging toy dataset, plus in complex mappings with significant dataset variation between humans, dolls, and anime faces, and between cats and dogs.
“Detecting GAN generated errors”, (2019-12-02):
Despite an impressive performance from the latest FID and IS. We leverage Improved Wasserstein, BigGAN, and StyleGAN to show a ranking based on our metric correlates impressively with scores. Our work opens the door for better understanding of and the ability to select the best samples from a model.for generating hyper-realistic images, discriminators have difficulty evaluating the quality of an individual generated sample. This is because the task of evaluating the quality of a generated image differs from deciding if an image is real or fake. A generated image could be perfect except in a single area but still be detected as fake. Instead, we propose a novel approach for detecting where errors occur within a generated image. By collaging real images with generated images, we compute for each pixel, whether it belongs to the real distribution or generated distribution. Furthermore, we leverage attention to model long-range dependency; this allows detection of errors which are reasonable locally but not holistically. For evaluation, we show that our error detection can act as a quality metric for an individual image, unlike
Among the major remaining challenges for generative adversarial networks (Compared to the BigGAN baseline, we achieve an average improvement of 2.7 points across , CelebA, and the newly introduced COCO-Animals dataset. The code is available at https://github.com/boschresearch/unetgan.) is the capacity to synthesize globally and locally coherent images with object shapes and textures indistinguishable from real images. To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature. The proposed U-Net based architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images, by providing the global image feedback as well. Empowered by the per-pixel response of the discriminator, we further propose a per-pixel consistency regularization technique based on the CutMix data augmentation, encouraging the U-Net discriminator to focus more on semantic and structural changes between real and fake images. This improves the U-Net discriminator training, further enhancing the quality of generated samples. The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics, enabling the generator to synthesize images with varying structure, appearance and levels of detail, maintaining global and local realism.
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout remove informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on the weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained classifier, when used as a pretrained model, results in consistent performance gains in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances. Source code and pretrained models are available at https://github.com/clovaai/CutMix-PyTorch .
“Malware Detection by Eating a Whole EXE”, (2017-10-25):
In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community. Building a neural network for such a problem presents a number of interesting challenges that have not occurred in tasks such as image processing or NLP. In particular, we note that detection from raw bytes presents a sequence problem with over two million time steps and a problem where batch normalization appear to hinder the learning process. We present our initial work in building a solution to tackle this problem, which has linear complexity dependence on the sequence length, and allows for interpretable sub-regions of the binary to be identified. In doing so we will discuss the many challenges in building a neural network to process data at this scale, and the methods we used to work around them.
A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. Although, the problem has largely been overcome via carefully constructed initializations and batch normalization, architectures incorporating skip-connections such as highway and resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise whereas, in contrast, the gradients in architectures with skip-connections are far more resistant to shattering, decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fully-connected networks and convnets. Finally, we present a new “looks linear” (LL) initialization that prevents shattering, with preliminary experiments showing the new initialization allows to train very deep networks without the addition of skip-connections.
Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.
Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable. Despite this, few researchers dare to train their models from scratch. Most work builds on one of a handful of object detection, while being roughly three orders of magnitude faster. When combined with pre-training methods, our initialization significantly outperforms work, narrowing the gap between supervised and unsupervised pre-training.pre-trained models, and fine-tunes or adapts these for specific tasks. This is in large part due to the difficulty of properly initializing these networks from scratch. A small miscalibration of the initial weights leads to vanishing or exploding gradients, as well as poor convergence properties. In this work we present a fast and simple data-dependent initialization procedure, that sets the weights of a network such that all units in the network train at roughly the same rate, avoiding vanishing or exploding gradients. Our initialization matches the current state-of-the-art unsupervised or self-supervised pre-training methods on standard computer vision tasks, such as image classification and
We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good starting point for optimization. Our dual view also reveals a pragmatic and aesthetic perspective of neural networks and underscores their expressive power.
“Deep Information Propagation”, (2016-11-04):
We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth scales that we identify provide bounds on how deep a network may be trained for a specific choice of hyperparameters. As a corollary to this, we argue that in networks at the edge of chaos, one of these depth scales diverges. Thus arbitrarily deep networks may be trained only sufficiently close to criticality. We show that the presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks. Finally, we develop a mean field theory for backpropagation and we show that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively.
“On weight initialization in deep neural networks”, (2017-04-28):
A proper initialization of the weights in a neural network is critical to its convergence. Current insights into weight initialization come primarily from linear activation functions. In this paper, I develop a theory for weight initializations with non-linear activations. First, I derive a general weight initialization strategy for any neural network using activation functions differentiable at 0. Next, I derive the weight initialization strategy for the Rectified Linear Unit (RELU), and provide theoretical insights into why the Xavier initialization is a poor choice with activations. My analysis provides a clear demonstration of the role of non-linearities in determining the proper weight initializations.
“Convolution Aware Initialization”, (2017-02-21):
Initialization of parameters in deep neural networks has been shown to have a big impact on the performance of the networks (Mishkin & Matas, 2015). The initialization scheme devised by He et al, allowed convolution activations to carry a constrained mean which allowed deep networks to be trained effectively (He et al., 2015a). Orthogonal initializations and more generally orthogonal matrices in standard recurrent networks have been proved to eradicate the vanishing and exploding gradient problem (Pascanu et al., 2012). Majority of current initialization schemes do not take fully into account the intrinsic structure of the convolution operator. Using the duality of the Fourier transform and the convolution operator, Convolution Aware Initialization builds orthogonal filters in the Fourier space, and using the inverse Fourier transform represents them in the standard space. With Convolution Aware Initialization we noticed not only higher accuracy and lower loss, but faster convergence. We achieve new state of the art on the CIFAR10 dataset, and achieve close to state of the art on various other tasks.
Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model’s architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32×32, achieving competitive performance with similarly-sized hand-designed networks. Our code is available at https://github.com/ajbrock/SMASH, DenseNet, and FractalNet blocks as special cases. We validate our
“CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3.”, (2021-01-05):
[CLIP paper] We present a neural network that aims to address these problems: it is trained on a wide variety of images with a wide variety of natural language supervision that’s abundantly available on the internet. By design, the network can be instructed in natural language to perform a great variety of classification benchmarks, without directly optimizing for the benchmark’s performance, similar to the “zero-shot” capabilities of GPT-25 and GPT-3.6 This is a key change: by not directly optimizing for the benchmark, we show that it becomes much more representative: our system closes this “robustness gap” by up to 75% while matching the performance of the original ResNet507 on zero-shot without using any of the original 1.28M labeled examples.
Approach: We show that scaling a simple pre-training task is sufficient to achieve competitive zero-shot performance on a great variety of image classification datasets. Our method uses an abundantly available source of supervision: the text paired with images found across the internet. This data is used to create the following proxy training task for CLIP: given an image, predict which out of a set of 32,768 randomly sampled text snippets, was actually paired with it in our dataset.
In order to solve this task, our intuition is that CLIP models will need to learn to recognize a wide variety of visual concepts in images and associate them with their names. As a result, CLIP models can then be applied to nearly arbitrary visual classification tasks. For instance, if the task of a dataset is classifying photos of dogs vs we check for each image whether a CLIP model predicts the text description “a photo of a dog” or “a photo of a cat” is more likely to be paired with it.
- CLIP is highly efficient…In the end, our best performing CLIP model trains on 256 GPUs for 2 weeks which is similar to existing large scale image models.
- CLIP is flexible and general: Because they learn a wide range of visual concepts directly from natural language, CLIP models are substantially more flexible and general than existing models. We find they are able to zero-shot perform many different tasks. To validate this we have measured CLIP’s zero-shot performance on over 30 different datasets including tasks such as fine-grained object classification, geo-localization, action recognition in videos, and OCR. [While CLIP’s zero-shot OCR performance is mixed, its semantic OCR representation is quite useful. When evaluated on the SST-2 NLP dataset rendered as images, a linear classifier on CLIP’s representation matches a CBoW model with direct access to the text. CLIP is also competitive at detecting hateful memes without needing ground truth text.] In particular, learning OCR is an example of an exciting behavior that does not occur in standard ImageNet models.
…CLIP allows people to design their own classifiers and removes the need for task-specific training data. [See also “AudioCLIP: Extending CLIP to Image, Text and Audio”, Guzhov et al 2021; CLIP notebook compilation for art, “Alien Dreams: An Emerging Art Scene”/“AI Generated Art Scene Explodes as Hackers Create Groundbreaking New Tools”.]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
“LipNet: End-to-End Sentence-level Lipreading”, (2016-11-05):
Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained perform only word classification, rather than sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely . To the best of our knowledge, LipNet is the first sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016).
“Distilling the Knowledge in a Neural Network”, (2015-03-09):
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.
“Do Deep Nets Really Need to be Deep?”, (2013-12-21):
Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this extended abstract, we show that shallow feed-forward networks can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow neural nets can learn these deep functions using a total number of parameters similar to the original deep model. We evaluate our method on the TIMIT phoneme recognition task and are able to train shallow fully-connected nets that perform similarly to complex, well-engineered, deep convolutional architectures. Our success in training shallow neural nets to mimic deeper models suggests that there probably exist better algorithms for training shallow feed-forward nets than those currently available.
“You and Your Research”, (1986-03-07):
[Transcript of a talk by mathematician and Bell Labs manager Richard Hamming about what he had learned about computers and how to do effective research (republished in expanded form as Art of Doing Science and Engineering: Learning to Learn; 1995 video). It is one of the most famous and most-quoted such discussions ever.]
At a seminar in the Bell Communications Research Colloquia Series, Dr. Richard W. Hamming, a Professor at the Naval Postgraduate School in Monterey, California and a retired Bell Labs scientist, gave a very interesting and stimulating talk, ‘You and Your Research’ to an overflow audience of some 200 Bellcore staff members and visitors at the Morris Research and Engineering Center on March 7, 1986. This talk centered on Hamming’s observations and research on the question “Why do so few scientists make substantial contributions and so many are forgotten in the long run?” From his more than 40 years of experience, 30 of which were at Bell Laboratories, he has made a number of direct observations, asked very pointed questions of scientists about what, how, and why they did things, studied the lives of great scientists and great contributions, and has done introspection and studied theories of creativity. The talk is about what he has learned in terms of the properties of the individual scientists, their abilities, traits, working habits, attitudes, and philosophy.
2012-woodley.pdf: “The social and scientific temporal correlates of genotypic intelligence and the Flynn effect”, Michael A. Woodley
How do genes affect cognitive ability or other human quantitative traits such as height or disease risk? Progress on this challenging question is likely to be significant in the near future. I begin with a brief review of psychometric measurements of intelligence, introducing the idea of a “general factor” or g score. The main results concern the stability, validity (predictive power), and heritability of adult g. The largest component of genetic variance for both height and intelligence is additive (linear), leading to important simplifications in predictive modeling and statistical estimation. Due mainly to the rapidly decreasing cost of genotyping, it is possible that within the coming decade researchers will identify loci which account for a significant fraction of total g variation. In the case of height analogous efforts are well under way. I describe some unpublished results concerning the genetic architecture of height and cognitive ability, which suggest that roughly 10k moderately rare causal variants of mostly negative effect are responsible for normal population variation. Using results from Compressed Sensing (L1-penalized regression), I estimate the statistical power required to characterize both linear and nonlinear models for quantitative traits. The main unknown parameter s (sparsity) is the number of loci which account for the bulk of the genetic variation. The required sample size is of order 100s, or roughly a million in the case of cognitive ability.
2015-henn.pdf: “Estimating the mutation load in human genomes”, Brenna M. Henn, Laura R. Botigué, Carlos D. Bustamante, Andrew G. Clark, Simon Gravel
2014-simons.pdf: “The deleterious , Yuval B. Simons, Michael C. Turchin, Jonathan K. Pritchard, Guy Sella is insensitive to recent population history”
2012-fu.pdf: “Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants”, Wenqing Fu, Timothy D. O’Connor, Goo Jun, Hyun Min Kang, Goncalo Abecasis, Suzanne M. Leal, Stacey Gabriel, David Altshuler, Jay Shendure, Deborah A. Nickerson, Michael J. Bamshad, NHLBI Sequencing Project, Joshua M. Akey
“The Biodemography of Fertility: A Review and Future Research Frontiers”, (2015-09-21; ):
The social sciences have been reticent to integrate a biodemographic approach to the study of fertility choice and behaviour, resulting in theories and findings that are largely socially-deterministic. The aim of this paper is to first reflect on reasons for this lack of integration, provide a review of previous examinations, take stock of what we have learned until now and propose future research frontiers. We review the early foundations of proximate determinants followed by behavioural genetic (family and twin) studies that isolated the extent of genetic influence on fertility traits. We then discuss research that considers gene and environment interaction and the importance of cohort and country-specific estimates, followed by multivariate models that explore motivational precursors to fertility and education. The next section on molecular genetics reviews fertility-related candidate gene studies and their shortcomings and on-going work on genome wide association studies. Work in evolutionary anthropology and biology is then briefly examined, focusing on evidence for natural selection. Biological and genetic factors are relevant in explaining and predicting fertility traits, with socio-environmental factors and their interaction still key in understanding outcomes. Studying the interplay between genes and the environment, new data sources and integration of new methods will be central to understanding and predicting future fertility trends.
[Keywords: fertility, age at first birth, number of children ever born, genetics, behavioural genetics, molecular genetics, natural selection]
Precision medicine necessitates large scale collections of genomes and phenomes. Despite decreases in the costs of genomic technologies, collecting these types of information at scale is still a daunting task that poses logistical challenges and requires consortium-scale resources. Here, we describe DNA.Land, a digital biobank to collect genome and phenomes with a fraction of the resources of traditional studies at the same scale. Our approach relies on crowd-sourcing data from the rapidly growing number of individuals that have access to their own genomic datasets through Direct-to-Consumer (DTC) companies. To recruit participants, we developed a series of automatic return-of-results features in DNA.Land that increase users’ engagement while stratifying human subject research protection. So far, DNA.Land has collected over 43,000 genomes in 20 months of operation, orders of magnitude higher than previous digital attempts by academic groups. We report lessons learned in running a digital , our technical framework, and our approach regarding ethical, legal, and social implications.
The study of continuously varying, quantitative traits is important in evolutionary biology, agriculture, and medicine. Variation in such traits is attributable to many, possibly interacting, genes whose expression may be sensitive to the environment, which makes their dissection into underlying causative factors difficult. An important population parameter for quantitative traits is heritability, the proportion of total variance that is due to genetic factors. Response to artificial and R. A. Fisher in 1918, the estimation of additive and dominance genetic and heritability in populations is based upon the expected proportion of genes shared between different types of relatives, and explicit, often controversial and untestable models of genetic and non-genetic causes of family resemblance. With genome-wide coverage of genetic markers it is now possible to estimate such parameters solely within families using the actual degree of identity-by-descent sharing between relatives. Using genome scans on 4,401 quasi-independent sib pairs of which 3,375 pairs had phenotypes, we estimated the heritability of height from empirical genome-wide identity-by-descent sharing, which varied from 0.374 to 0.617 (mean 0.498, standard deviation 0.036). The in identity-by-descent sharing per chromosome and per genome was consistent with theory. The maximum likelihood estimate of the heritability for height was 0.80 with no evidence for non-genetic causes of sib resemblance, consistent with results from independent twin and family studies but using an entirely separate source of information. Our application shows that it is feasible to estimate genetic solely from within-family segregation and provides an independent validation of previously untestable assumptions. Given sufficient data, our new paradigm will allow the estimation of genetic variation for disease susceptibility and quantitative traits that is free from confounding with non-genetic factors and will allow partitioning of genetic variation into additive and non-additive components.and the degree of resemblance between relatives are all a function of this parameter. Following the classic paper by
Quantitative geneticists attempt to understand variation between individuals within a population for traits such as height in humans and the number of bristles in fruit flies. This has been traditionally done by partitioning the variation in underlying sources due to genetic and environmental factors, using the observed amount of variation between and within families. A problem with this approach is that one can never be sure that the estimates are correct, because nature and nurture can be confounded without one knowing it. The authors got around this problem by comparing the similarity between relatives as a function of the exact proportion of genes that they have in common, looking only within families. Using this approach, the authors estimated the amount of total variation for height in humans that is due to genetic factors from 3,375 sibling pairs. For each pair, the authors estimated the proportion of genes that they share from DNA markers. It was found that about 80% of the total variation can be explained by genetic factors, close to results that are obtained from classical studies. This study provides the first validation of an estimate of genetic variation by using a source of information that is free from nature–nurture assumptions.
Although the expected relationship or proportion of genome shared by pairs of relatives can be obtained from their pedigrees, the actual quantities deviate as a consequence of Mendelian sampling and depend on the number of chromosomes and map length. Formulae have been published previously for the recombination in males and sex linkage. We derive the magnitude of skew in the proportion shared, showing the skew becomes increasingly large the more distant the relationship. The results obtained for variation in actual relationship apply directly to the variation in actual inbreeding as both are functions of genomic coancestry, and we show how to partition the variation in actual inbreeding between and within families. Although the of actual relationship falls as individuals become more distant, its coefficient of variation rises, and so, exacerbated by the skewness, it becomes increasingly difficult to distinguish different pedigree relationships from the actual fraction of the genome shared.of actual relationship for a number of specific types of relatives but no general formula for non-inbred individuals is available. We provide here a unified framework that enables the variances for distant relatives to be easily computed, showing, for example, how the of sharing for great grandparent-great grandchild, great uncle-great nephew, half uncle-nephew and first cousins differ, even though they have the same expected relationship. Results are extended in order to include differences in map length between sexes, no
A current concern in genetic epidemiology studies in admixed populations is that population stratification can lead to spurious results. The Brazilian census classifies individuals according to self-reported “color”, but several studies have demonstrated that stratifying according to “color” is not a useful strategy to control for population structure, due to the dissociation between self-reported “color” and genomic ancestry. We report the results of a study in a group of Brazilian siblings in which we measured skin pigmentation using a reflectometer, and estimated genomic ancestry using 21 Ancestry Informative Markers (AIMs). Self-reported “color”, according to the Brazilian census, was also available for each participant. This made it possible to evaluate the relationship between self-reported “color” and skin pigmentation, self-reported “color” and genomic ancestry, and skin pigmentation and genomic ancestry. We observed that, although there were statistically-significant differences between the three “color” groups in genomic ancestry and skin pigmentation, there was considerable dispersion within each group and substantial overlap between groups. We also saw that there was no good agreement between the “color” categories reported by each member of the sibling pair: 30 out of 86 sibling pairs reported different “color”, and in some cases, the sibling reporting the darker “color” category had lighter skin pigmentation. Socioeconomic status was statistically-significantly associated with self-reported “color” and genomic ancestry in this sample. This and other studies show that subjective classifications based on self-reported “color”, such as the one that is used in the Brazilian census, are inadequate to describe the population structure present in recently admixed populations. Finally, we observed that one of the AIMs included in the panel (rs1426654), which is located in the known pigmentation gene SLC24A5, was strongly associated with skin pigmentation in this sample.
Objective: Genetic variation influences differential vulnerability to addiction within populations. However, it remains unclear whether differences in frequencies of vulnerability alleles contribute to disparities between populations and to what extent ancestry correlates with differential exposure to environmental risk factors, including poverty and trauma.
Method: The authors used 186 ancestry-informative markers to measure African ancestry in 407 addicts and 457 comparison subjects self-identified as African Americans. The reference group was 1,051 individuals from the Human Genome Diversity Cell Line Panel, which includes 51 diverse populations representing most worldwide genetic diversity.
Results: African Americans varied in degrees of African, European, Middle Eastern, and Central Asian genetic heritage. The overall level of African ancestry was actually smaller among cocaine, opiate, and alcohol addicts (proportion = 0.76-0.78) than nonaddicted African American comparison subjects (proportion = 0.81). African ancestry was associated with living in impoverished neighborhoods, a factor previously associated with risk. There was no association between African ancestry and exposure to childhood abuse or neglect, a factor that strongly predicted all types of addictions.
Conclusions: These results suggest that African genetic heritage does not increase the likelihood of genetic risk for addictions. They highlight the complex interrelation between genetic ancestry and social, economic, and environmental conditions and the strong relation of those factors to addiction. Studies of epidemiological samples characterized for genetic ancestry and social, psychological, demographic, economic, cultural, and historical factors are needed to better disentangle the effects of genetic and environmental factors underlying interpopulation differences in vulnerability to addiction and other health disparities.
The role of race in human genetics and biomedical research is among the most contested issues in science. Much debate centers on the relative importance of genetic versus sociocultural factors in explaining racial inequalities in health. However, few studies integrate genetic and sociocultural data to test competing explanations directly.
We draw on ethnographic, epidemiologic, and genetic data collected in southeastern Puerto Rico to isolate two distinct variables for which race is often used as a proxy: genetic ancestry versus social classification. We show that color, an aspect of social classification based on the culturally defined meaning of race in Puerto Rico, better predicts blood pressure than does a genetic-based estimate of continental ancestry. We also find that incorporating sociocultural variables reveals a new andassociation between a candidate gene polymorphism for hypertension (α2C adrenergic receptor deletion) and blood pressure.
This study addresses the recognized need to measure both genetic and sociocultural factors in research on racial inequalities in health. Our preliminary results provide the most direct evidence to date that previously reported associations between genetic ancestry and health may be attributable to sociocultural factors related to race and racism, rather than to functional genetic differences between racially defined groups. Our results also imply that including sociocultural variables in future research may improve our ability to detect statistically-significant allele-phenotype associations. Thus, measuring sociocultural factors related to race may both empower future genetic association studies and help to clarify the biological consequences of social inequalities.
Background: African-Americans generally have lower circulating levels of 25 hydroxyvitamin D [25(OH)D] than Whites, attributed to skin pigmentation and dietary habits. Little is known about the genetic determinants of 25(OH)D levels nor whether the degree of African ancestry associates with circulating 25(OH)D.
Methods: With the use of a panel of 276 ancestry informative genetic markers, we estimated African and European admixture for a sample of 758 African-American and non-Hispanic White Southern Community Cohort Study participants. For African-Americans, cut points of <85%, 85% to 95%, and >or = 95% defined low, medium, and high African ancestry, respectively. We estimated the association between African ancestry and 25(OH)D and also explored whether vitamin D exposure (sunlight, diet) had varying effects on 25(OH)D levels dependent on ancestry level.
Results: The mean serum 25(OH)D levels among Whites and among African-Americans of low, medium, and high African ancestry were 27.2, 19.5, 18.3, and 16.5 ng/mL, respectively. Serum 25(OH)D was estimated to decrease by 1.0 to 1.1 ng/mL per 10% increase in African ancestry. The effect of high vitamin D exposure from sunlight and diet was 46% lower among African-Americans with high African ancestry than among those with low/medium ancestry.
Conclusions: We found novel evidence that the level of African ancestry may play a role in clinical vitamin D status.
Impact: This is the first study to describe how 25(OH)D levels vary in relation to genetic estimation of African ancestry. Further study is warranted to replicate these findings and uncover the potential pathways involved.
Objective: The objective of this study was to investigate whether differences in admixture in African-American (AFA) and Hispanic-American (HA) adult women are associated with adiposity and adipose distribution.
Design: The proportion of European, sub-Saharan African and Amerindian admixture was estimated for AFA and HA women in the Women’s Heath Initiative using 92 ancestry informative markers. Analyses assessed the relationship between admixture and adiposity indices.
Subjects: The subjects included 11 712 AFA and 5088 HA self-identified post-menopausal women.
Results: There was a significant positive association between body mass index (BMI) and African admixture when was considered as a continuous variable, and age, education, physical activity, parity, family income and smoking were included covariates (p < 10−4). A dichotomous model (upper and lower BMI quartiles) showed that African admixture was associated with a high odds ratio (OR = 3.27 (for 100% admixture compared with 0% admixture), 95% confidence interval 2.08-5.15). For HA, there was no association between and admixture. In contrast, when waist-to-hip ratio (WHR) was used as a measure of adipose distribution, there was no association between WHR and admixture in AFA but there was a strong association in HA (p < 10−4; OR Amerindian admixture = 5.93, confidence interval = 3.52-9.97).
Conclusion: These studies show that: (1) African admixture is associated with in AFA women; (2) Amerindian admixture is associated with WHR but not in HA women; and (3) it may be important to consider different measurements of adiposity and adipose distribution in different ethnic population groups.
“The Great Migration and African-American Genomic Diversity”, (2016-04-26):
We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and. An estimated 82.1% of ancestors to African-Americans lived in Africa to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West.
Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north-bound and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of ~15–16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance.
Genetic studies of African-Americans identify functional variants, elucidate historical and genealogical mysteries, and reveal basic biology. However, African-Americans have been under-represented in genetic studies, and relatively little is known about nation-wide patterns of genomic diversity in the population.
Here, we study African-American genomic diversity using genotype data from nationally and regionally representative cohorts. Access to these unique cohorts allows us to clarify the role of population structure, admixture, and recent massive migrations in shaping African-American genomic diversity and sheds new light on the genetic history of this population.
Narrative reports suggest that associated with biogeographic ancestry (BGA) in the Americas. If so, potentially acts as a confound that needs to be taken into account when evaluating the relation between medical outcomes and BGA. To explore how systematic BGA- associations are, a meta-analysis of American studies was conducted. 40 studies were identified, yielding a total of 64 independent samples with directions of associations, including 48 independent samples with effect sizes. An analysis of association directions found a high degree of consistency. The square rootn-weighted directions were 0.83 (K = 36), −0.81 (K = 41) and −0.82 (K = 39) for European, Amerindian and African BGA, respectively. An analysis of magnitudes found that European BGA was positively associated with , with a meta-analytic effect size of r = 0.18 [95% Cl: .13 to .24, K = 28, n = 35,476.5], while both Amerindian and African BGA were negatively associated with , having meta-analytic of −.14 [−.18 to −.10, K = 31, n = 28,937.5] and −.11 [−.15 to −.07, K = 28, n = 32,710.5], respectively. There was considerable cross-sample variation in (mean I2 = 92%), but the sample size was not enough for performing credible moderator analysis. Implications for future studies are discussed.( ) is
2007-nice-guidelines-ch8.pdf: “The guidelines manual - Chapter 8: Incorporating health economics in guidelines and assessing resource impact”, NICE
1991-defries.pdf: “Chapter 3: Colorado Reading Project: An Update”, J. C. DeFries, R. K. Olson, B. F. Pennington, S. D. Smith
2016-makel.pdf: “When Lightning Strikes Twice”, (2016-07-01; ):
The educational, occupational, and creative accomplishments of the profoundly gifted participants (IQs ⩾ 160) in the Study of Mathematically Precocious Youth (SMPY) are astounding, but are they representative of equally able 12-year-olds? Duke University’s Talent Identification Program (TIP) identified 259 young adolescents who were equally gifted. By age 40, their life accomplishments also were extraordinary: Thirty-seven percent had earned doctorates, 7.5% had achieved academic tenure (4.3% at research-intensive universities), and 9% held patents; many were high-level leaders in major organizations. As was the case for the sample before them, differential ability strengths predicted their contrasting and eventual developmental trajectories—even though essentially all participants possessed both mathematical and verbal reasoning abilities far superior to those of typical Ph.D. recipients. Individuals, even profoundly gifted ones, primarily do what they are best at. Differences in ability patterns, like differences in interests, guide development along different paths, but ability level, coupled with commitment, determines whether and the extent to which noteworthy accomplishments are reached if opportunity presents itself.
[Keywords: intelligence, creativity, giftedness, replication, blink comparator]
1936-byrns.pdf: “Intelligence and Nationality of Wisconsin School Children”, Ruth Byrns
This article brings forward an estimation of the proportion of homonyms in large scale groups based on the distribution of first names and last names in a subset of these groups. The estimation is based on the generalization of the “birthday paradox problem”. The main results is that, in societies such as France or the United States, identity collisions (based on first + last names) are frequent. The large majority of the population has at least one homonym. But in smaller settings, it is much less frequent : even if small groups of a few thousand people have at least one couple of homonyms, only a few individuals have an homonym.
2007-lensberg.pdf: “On the Evolution of Investment Strategies and the Kelly Rule—A Darwinian Approach”, Terje Lensberg, Klaus Reiner Schenk-Hoppé
2003-murray-humanaccomplishment.pdf: “Human Accomplishment”, Charles Murray
The brms package provides an interface to fit Bayesian generalized (non-)linear multivariate multilevel models using Stan, which is a C++ package for performing full Bayesian inference (see http://mc-stan.org/). The formula syntax is very similar to that of the package lme4 to provide a familiar and simple interface for performing regression analyses. A wide range of response distributions are supported, allowing users to fit—among others—linear, robust linear, count data, survival, response times, ordinal, zero-inflated, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, missing value imputation, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Multivariate models (ie., models with multiple response variables) can be fit, as well. specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks, cross-validation, and Bayes factors.
“Indefinite survival through backup copies”, (2012-06-06):
If an individual entity endures a fixed probability μ <1 of disappearing (“dying”) in a given fixed time period, then, as time approaches infinity, the probability of death approaches certainty. One approach to avoid this fate is for individuals to copy themselves into different locations; if the copies each have an independent probability of dying, then the total risk is much reduced. However, to avoid the same ultimate fate, the entity must continue copying itself to continually reduce the risk of death. In this paper, we show that to get a non-zero probability of ultimate survival, it suffices that the number of copies grows logarithmically with time. Accounting for expected copy casualties, the required rate of copying is hence bounded.
1967-samuelson.pdf: “General Proof that Diversification Pays”, Paul Samuelson
1999-ross.pdf: “Adding Risks: Samuelson's Fallacy of Large Numbers Revisited”, Stephen A. Ross
2013-leinonen.pdf: “Q~ST~-F~ST~ comparisons: evolutionary and ecological insights from genomic heterogeneity”, Tuomas Leinonen, R. J. Scott McCairns, Robert B. O'Hara, Juha Merilä
Urbanization significantly alters natural ecosystems and has accelerated globally. Urban wildlife populations are often highly fragmented by human infrastructure, and isolated populations may adapt in response to local urban pressures. However, relatively few studies have identified genomic signatures of adaptation in urban animals. We used a landscape genomics approach to examine signatures of selection in urban populations of white-footed mice (Peromyscus leucopus) in New York City. We analyzed 154,770 SNPs identified from transcriptome data from 48 P. leucopus individuals from three urban and three rural populations, and used outlier tests to identify evidence of urban adaptation. We accounted for demography by simulating a neutral SNP dataset under an inferred demographic history as a null model for outlier analysis. We also tested whether candidate genes were associated with environmental variables related to urbanization. In total, we detected 381 outlier loci and after stringent filtering, identified and annotated 19 candidate loci. Many of the candidate genes were involved in metabolic processes, and have well-established roles in metabolizing lipids and carbohydrates. Our results indicate that white-footed mice in NYC are adapting at the biomolecular level to local selective pressures in urban habitats. Annotation of outlier loci suggest selection is acting on metabolic pathways in urban populations, likely related to novel diets in cities that differ from diets in less disturbed areas.
2013-walsh-book2-ch14-draft.pdf: “Chapter 14. Short-term Changes in the Mean: 2. Truncation and Threshold Selection [2013 draft]”, (2013; ):
This brief chapter first considers the theory of truncation selection on the mean, which is of general interest, and then examines a number of more specialized topics that may be skipped by the casual reader. Truncation selection (Figure 14.1) occurs when all individuals on one side of a threshold are chosen, and is by far the commonest form of artificial selection in breeding and laboratory experiments. One key result is that for a normally-distributed trait, the selection intensity ī is fully determined by the fraction p saved (Equation 14.3a), provided that the chosen number of adults is large. This allows a breeder or experimentalist to predict the expected response given their choice of p.
The remaining topics are loosely organized around the theme of selection intensity and threshold selection. First, when a small number of adults are chosen to form the next generation, Equation 14.3a overestimates the expected ī, and we discuss how to correct for this small sample effect. This correction is important when only a few individuals form the next generation, but is otherwise relatively minor. The rest of the chapter considers the response in discrete traits. We start with a binary (present/absence) trait, and show how an underlying liability model can be used to predict response. We also examine binary trait response in a logistic regression framework (estimating the probability of showing the trait given some underlying liability scores) and the evolution of both the mean value on the liability scale and the threshold value. We conclude with a few brief comments on response when a trait is better modeled as Poisson, rather than normally, distributed…In addition to being the commonest form of artificial selection, truncation selection is also the most efficient, giving the largest selection intensity of any scheme culling the same fraction of individuals from a population (Kimura & Crow 1978, Crow and Kimura 1979).
[Preprint chapter of Evolution and Selection of Quantitative Traits, Lynch & Walsh 2018]
Over 12 years, my self-experimentation found new and useful ways to improve sleep, mood, health, and weight. Why did it work so well? First, my position was unusual. I had the subject-matter knowledge of an insider, the freedom of an outsider, and the motivation of a person with the problem. I did not need to publish regularly. I did not want to display status via my research. Second, I used a powerful tool. Self-experimentation about the brain can test ideas much more easily (by a factor of about 500,000) than conventional research about other parts of the body. When you gather data, you sample from a power-law-like distribution of progress. Most data helps a little; a tiny fraction of data helps a lot. My subject-matter knowledge and methodological skills (e.g., in data analysis) improved the distribution from which I sampled (i.e., increased the average amount of progress per sample). Self-experimentation allowed me to sample from it much more often than conventional research. Another reason my self-experimentation was unusually effective is that, unlike professional science, it resembled the exploration of our ancestors, including foragers, hobbyists, and artisans.
This paper examines the joint evolution of emigration and individualism in Scandinavia during the Age of Mass Migration (1850–1920). A long-standing hypothesis holds that people of a stronger individualistic mindset are more likely to migrate as they suffer lower costs of abandoning existing social networks. Building on this hypothesis, I propose a theory of cultural change where migrant self-selection generates a relative push away from individualism, and towards collectivism, in migrant-sending locations through a combination of initial distributional effects and channels of intergenerational cultural transmission. Due to the interdependent relationship between emigration and individualism, emigration is furthermore associated with cultural convergence across subnational locations. I combine various sources of empirical data, including historical population census records and passenger lists of emigrants, and test the relevant elements of the proposed theory at the individual and subnational district level, and in the short and long run. Together, the empirical results suggest that individualists were more likely to migrate than collectivists, and that the Scandinavian countries would have been considerably more individualistic and culturally diverse, had emigration not taken place.
[Keywords: culture, individualism, migration, selection, economic history]
We study the political effects of mass emigration to the United States in the nineteenth century using data from Sweden. To instrument for total emigration over several decades, we exploit severe local frost shocks that sparked an initial wave of emigration, interacted with within-country travel costs. Our estimates show that emigration substantially increased the local demand for political change, as measured by labor movement membership, strike participation, and voting. Emigration also led to de facto political change, increasing welfare expenditures as well as the likelihood of adopting more inclusive political institutions.
“Genetic Consequences of Social Stratification in Great Britain”, (2018-10-30):
Human DNA varies across geographic regions, with most variation observed so far reflecting distant ancestry differences. Here, we investigate the geographic clustering of genetic variants that influence complex traits and disease risk in a sample of ~450,000 individuals from Great Britain. Out of 30 traits analyzed, 16 show significant geographic clustering at the genetic level after controlling for ancestry, likely reflecting recent migration driven by ( ). Alleles associated with educational attainment (EA) show most clustering, with EA-decreasing alleles clustering in lower areas such as coal mining areas. Individuals that leave coal mining areas carry more EA-increasing alleles on average than the rest of Great Britain. In addition, we leveraged the geographic clustering of complex trait variation to further disentangle regional differences in socio-economic and cultural outcomes through genome-wide association studies on publicly available regional measures, namely coal mining, religiousness, 1970/2015 general election outcomes, and Brexit referendum results.