Genomic selection—the prediction of breeding values using DNA polymorphisms—is a disruptive method that has widely been adopted by animal and plant breeders to increase crop, forest and livestock productivity and ultimately secure food and energy supplies. It improves breeding schemes in different ways, depending on the biology of the species and genotyping and phenotyping constraints. However, both genomic selection and classical phenotypic selection remain difficult to implement because of the high genotyping and phenotyping costs that typically occur when selecting large collections of individuals, particularly in early breeding generations. To specifically address these issues, we propose a new conceptual framework called phenomic selection, which consists of a prediction approach based on low-cost and high-throughput phenotypic descriptors rather than DNA polymorphisms. We applied phenomic selection on two species of economic interest (wheat and poplar) using near-infrared spectroscopy on various tissues. We showed that one could reach accurate predictions in independent environments for developmental and productivity traits and tolerance to disease. We also demonstrated that under realistic scenarios, one could expect much higher genetic gains with phenomic selection than with genomic selection. Our work constitutes a proof of concept and is the first attempt at phenomic selection; it clearly provides new perspectives for the breeding community, as this approach is theoretically applicable to any organism and does not require any genotypic information.
Genealogies are likely the first, centuries-old “big data”, with their construction as old as human civilization itself. Globalization, and the identity crisis that ensued, turned many to online services, building family trees and investigating connections to historical records and other family trees . An explosion has been underway since the beginning of the century in the number and usage of websites offering such genealogical services. About 130 million users combine to have created almost four billion profiles for family members across the three most popular websites of genealogy enthusiasts, Ancestry.com, MyHeritage, and Geni. More recent years have witnessed a similar rapid increase of genetic-based services that address the same need to learn about familial relationships and ancestry. These vast amounts of crowdsourced—and often crowdfunded (as users often pay for these services)—data offers ample scientific research opportunities that would otherwise require expansive collection. In a paper published today in Science, Kaplanis et al. [2, 3] introduce a genealogical dataset based on processing 86 million public Geni profiles. Armed with this crowdsourced dataset, they address fundamental research questions.
Methods for using GWAS to estimate genetic correlations between pairwise combinations of traits have produced “atlases” of genetic architecture. Genetic atlases reveal pervasive pleiotropy, and genome-wide statistically-significant loci are often shared across different phenotypes. We introduce genomic structural equation modeling (Genomic SEM), a multivariate method for analyzing the joint genetic architectures of complex traits. Using formal methods for modeling covariance structure, synthesizes genetic correlations and SNP-heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. can be used to identify variants with effects on general dimensions of cross-trait liability, boost power for discovery, and calculate more predictive polygenic scores. Finally, can be used to identify loci that cause divergence between traits, aiding the search for what uniquely differentiates highly correlated phenotypes. We demonstrate several applications of , including a joint analysis of summary statistics from five genetically correlated psychiatric traits. We identify 27 independent SNPs not previously identified in the univariate GWASs, 5 of which have been reported in other published GWASs of the included traits. Polygenic scores derived from Genomic SEM consistently outperform polygenic scores derived from GWASs of the individual traits. is flexible, open ended, and allows for continuous innovations in how multivariate genetic architecture is modeled.
2018-hysi.pdf: “Genome-wide association meta-analysis of individuals of European ancestry identifies new loci explaining a substantial fraction of hair color variation and heritability”, Pirro G. Hysi, Ana M. Valdes, Fan Liu, Nicholas A. Furlotte, David M. Evans, Veronique Bataille, Alessia Visconti, Gibran Hemani, George McMahon, Susan M. Ring, George Davey Smith, David L. Duffy, Gu Zhu, Scott D. Gordon, Sarah E. Medland, Bochao D. Lin, Gonneke Willemsen, Jouke Hottenga, Dragana Vuckovic, Giorgia Girotto, Ilaria Gandin, Cinzia Sala, Maria Pina Concas, Marco Brumat, Paolo Gasparini, Daniela Toniolo, Massimiliano Cocca, Antonietta Robino, Seyhan Yazar, Alex W. Hewitt, Yan Chen, Changqing Zeng, Andre G. Uitterlinden, M. Arfan Ikram, Merel A. Hamer, Cornelia M. Duijn, Tamar Nijsten, David A. Mackey, Mario Falchi, Dorret I. Boomsma, Nicholas G. Martin, David A. Hinds, Manfred Kayser, Timothy D. Spector
Social genetic effects (SGE, also called indirect genetic effects) are associations between genotypes of one individual and phenotype of another. SGE arise when two individuals interact and heritable traits of one influence the phenotype of the other. Recent studies have shown that SGE substantially contribute to phenotypic variation in humans and laboratory mice, which suggests that SGE, like direct genetic effects (DGE, effects of an individual’s genes on their own phenotype), are amenable to mapping. Using 170 phenotypes including behavioural, physiological and morphological traits measured in outbred laboratory mice, we empirically explored the potential and challenges of genome-wide association study of SGE (sgeGWAS) as a tool to discover novel mechanisms of social effects between unrelated individuals. For each phenotype we performed sgeGWAS, identifying 21 genome-wide statistically-significant SGE associations for 17 phenotypes, and dgeGWAS for comparison. Our results provide three main insights: first, SGE and DGE arise from partially different loci and/or loci with different effect sizes, which implies that the widely-studied mechanism of phenotypic “contagion” is not sufficient to explain all social effects. Secondly, several DGE associations but no SGE associations had large effects, suggesting sgeGWAS is unlikely to uncover “low hanging fruits”. Finally, a similar number of variants likely contribute to SGE and DGE. The analytical framework we developed in this study and the insights we gained from our analyses will inform the design, implementation and interpretation of sgeGWAS in this and other populations and species.
Sleep is an essential homeostatically-regulated state of decreased activity and alertness conserved across animal species, and both short and long sleep duration associate with chronic disease and all-cause mortality1,2. Defining genetic contributions to sleep duration could point to regulatory mechanisms and clarify causal disease relationships. Through UK Biobank, we discover 78 loci for self-reported sleep duration that further impact accelerometer-derived measures of sleep duration, daytime inactivity duration, sleep efficiency and number of sleep bouts in a subgroup (n = 85,499) with up to 7-day accelerometry. Associations are enriched for genes expressed in several brain regions, and for pathways including striatum and subpallium development, mechanosensory response, dopamine binding, synaptic neurotransmission, catecholamine production, synaptic plasticity, and unsaturated fatty acid metabolism. Genetic correlation analysis indicates shared biological links between sleep duration and psychiatric, cognitive, anthropometric and metabolic traits and Mendelian randomization highlights a causal link of longer sleep with schizophrenia.in 446,118 participants of European ancestry from the
Using data from 697,828 research participants from 23andMe and UK Biobank, we identified 351 loci associated with being a morning person, a behavioural indicator of a person’s underlying circadian rhythm. These loci were validated in 85,760 individuals with activity-monitor derived measures of sleep timing: the mean sleep timing of the 5% of individuals carrying the most “morningness” alleles was 25.1 minutes (95% CI: 22.5, 27.6) earlier than the 5% carrying the fewest. The loci were enriched for genes involved in circadian rhythm and insulin pathways, and those expressed in the retina, hindbrain, hypothalamus, and pituitary (all FDRw<1%). We provide some evidence that being a morning person was causally associated with reduced risk of (OR: 0.89; 95% : 0.82, 0.96), depression (OR: 0.94; 95% : 0.91, 0.98) and a lower age at last childbirth in women (β: -046 years; 95% : -0.067, -0.025), but was not associated with BMI (β: -4.6×10−4; 95% : -0.044, 0.043) or type 2 diabetes (OR: 1.00; 95% : 0.91, 1.1). This study offers new insights into the biology of circadian rhythms and disease links in humans.
Different human populations facing similar environmental challenges have sometimes evolved convergent biological adaptations, for example hypoxia resistance at high altitudes and depigmented skin in northern latitudes on separate continents. The pygmy phenotype (small adult body size), a characteristic of hunter-gatherer populations inhabiting both African and Asian tropical rainforests, is often highlighted as another case of convergent adaptation in humans. However, the degree to which phenotypic convergence in this polygenic trait is due to convergent vs. population-specific genetic changes is unknown. To address this question, we analyzed high-coverage sequence data from the protein-coding portion of the genomes (exomes) of two pairs of populations, Batwa rainforest hunter-gatherers and neighboring Bakiga agriculturalists from Uganda, and Andamanese rainforest hunter-gatherers (Jarawa and Onge) and Brahmin agriculturalists from India. We observed signatures of convergent positive selection between the Batwa and Andamanese rainforest hunter-gatherers across the set of genes with annotated ‘growth factor binding’ functions (p< 0.001). Unexpectedly, for the rainforest groups we also observed convergent and population-specific signatures of positive selection in pathways related to cardiac development (e.g. ‘cardiac muscle tissue development’; p = 0.003). We hypothesize that the growth hormone sub-responsiveness likely underlying the pygmy phenotype may have led to compensatory changes in cardiac pathways, in which this hormone also plays an essential role. Importantly, we did not observe similar patterns of positive selection on sets of genes associated with either growth or cardiac development in the agriculturalist populations, indicating that our results most likely reflect a history of convergent adaptation to the similar ecology of rainforest hunter-gatherers rather than a more common or general evolutionary pattern for human populations.
2018-ilardo.pdf: “Physiological and Genetic Adaptations to Diving in Sea Nomads”, Melissa A. Ilardo, Ida Moltke, Thorfinn S. Korneliussen, Jade Cheng, Aaron J. Stern, Fernando Racimo, Peter de Barros Damgaard, Martin Sikora, Andaine Seguin-Orlando, Simon Rasmussen, Inge C. L. van den Munckhof, Rob ter Horst, Leo A. B. Joosten, Mihai G. Netea, Suhartini Salingkat, Rasmus Nielsen, Eske Willerslev
The role played by natural selection in shaping present-day human populations has received extensive scrutiny [1, 2, 3], especially in the context of local adaptations . However, most studies to date assume, either explicitly or not, that populations have been in their current locations long enough to adapt to local conditions , and that population sizes were large enough to allow for the action of selection . If these conditions were satisfied, not only should selection be effective at promoting local adaptations, but deleterious alleles should also be eliminated over time. To assess this prediction, the genomes of 2,062 individuals, including 1,179 ancient humans, were reanalyzed to reconstruct how frequencies of risk alleles and their homozygosity changed through space and time in Europe. While the overall deleterious consistently decreased through space and time, risk alleles have shown a steady increase in frequency. Even the mutations that are predicted to be most deleterious fail to exhibit any significant decrease in frequency. These conclusions do not deny the existence of local adaptations, but highlight the limitations imposed by drift and range expansions on the strength of selection in purging the mutational load affecting human populations.
Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator’s output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, OMNIGLOT, CELEBA) and synthetic 3D datasets. A video of the agent can be found at YouTube.
2018-segler.pdf: “Planning chemical syntheses with deep neural networks and symbolic AI”, Marwin H. S. Segler, Mike Preuss, Mark P. Waller
“Learning to Navigate in Cities Without a Map”, (2018-03-31):
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation (“I am here”) and a representation of the goal (“I am going there”). Building upon recent research that applies deepto maze navigation problems, we present an deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn
“PlaNet—Photo Geolocation with Convolutional Neural Networks”, (2016-02-17):
Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model.
Objective: To assess differences in estimated treatment effects for mortality between observational studies with routinely collected health data (RCD; that are published before trials are available) and subsequent evidence from randomized controlled trials on the same clinical question.
Design: Meta-epidemiological survey.
Data sources: PubMed searched up to November 2014.
Methods: Eligible RCD studies were published up to 2010 that used propensity scores to address confounding bias and reported comparative effects of interventions for mortality. The analysis included only RCD studies conducted before any trial was published on the same topic. The direction of treatment effects, confidence intervals, and (odds ratios) were compared between RCD studies and randomized controlled trials. The relative odds ratio (that is, the summary odds ratio of trial(s) divided by the RCD study estimate) and the summary relative odds ratio were calculated across all pairs of RCD studies and trials. A summary relative odds ratio greater than one indicates that RCD studies gave more favorable mortality results.
Results: The evaluation included 16 eligible RCD studies, and 36 subsequent published investigating the same clinical questions (with 17 275 patients and 835 deaths). Trials were published a median of three years after the corresponding RCD study. For five (31%) of the 16 clinical questions, the direction of treatment effects differed between RCD studies and trials. in nine (56%) RCD studies did not include the effect estimate. Overall, RCD studies showed statistically-significantly more favorable mortality estimates by 31% than subsequent trials (summary relative odds ratio 1.31 (95% 1.03 to 1.65; I2 = 0%)).
Conclusions: Studies of routinely collected health data could give different answers from subsequent randomized controlled trials on the same clinical questions, and may substantially overestimate treatment effects. Caution is needed to prevent misguided clinical decision making.
2005-jussim.pdf: “Teacher expectations and self-fulfilling prophecies: knowns and unknowns, resolved and unresolved controversies”, (2005; ):
This article shows that 35 years of empirical research on teacher expectations justifies the following conclusions: (a) Self-fulfilling prophecies in the classroom do occur, but these effects are typically small, they do not accumulate greatly across perceivers or over time, and they may be more likely to dissipate than accumulate; (b) powerful self-fulfilling prophecies may selectively occur among students from stigmatized social groups; (c) whether self-fulfilling prophecies affect intelligence, and whether they in general do more harm than good, remains unclear, and (d) teacher expectations may predict student outcomes more because these expectations are accurate than because they are self-fulfilling. Implications for future research, the role of self-fulfilling prophecies in social problems, and perspectives emphasizing the power of erroneous beliefs to create social reality are discussed.
[Jussim discusses the famous ‘Pygmalion effect’. It demonstrates the Replication crisis: an initial extraordinary finding indicating that teachers could raise student IQs by dozens of points gradually shrunk over repeated replications to essentially zero net long-term effect. The original finding was driven by statistical malpractice bordering on research fraud: some students had “pretest IQ scores near zero, and others had post-test IQ scores over 200”! Rosenthal further maintained the Pygmalion effect by statistical trickery, such as his ‘fail-safe N’, which attempted to show that hundreds of studies would have to have not been published in order for the Pygmalion effect to be true—except this assumes zero publication bias in those unpublished studies and begs the question.]
2017-wallach.pdf: “Evaluation of Evidence of Statistical Support and Corroboration of Subgroup Claims in Randomized Clinical Trials”, American Medical Association
2011-anthony.pdf: “The importance of eating local: slaughter and scurvy in Antarctic cuisine”, Jason C. Anthony
Plants have physical and chemical mechanisms for defense from attack by animals. Phytochemical defenses that protect plants from attack by insects include antifeedants, insecticides, and insect growth regulators. Phytochemical options exist by which plants can modulate the fertility of the other major group of plant predators, vertebrate herbivores, and thereby reduce cumulative attacks by those herbivores. The success of such a defense depends upon phytochemical mimicry of vertebrate reproductive hormones. Phytoestrogens do mimic reproductive hormones and are proposed to be defensive substances produced by plants to modulate the fertility of herbivores.
1979-imperatomcginley.pdf: “Androgens and the Evolution of Male-Gender Identity among Male Pseudohermaphrodites with 5α-Reductase Deficiency”, Imperato-McGinley Julianne, Peterson Ralph E., Gautier Teofilo, Sturla Erasmo
2018-teblunthuis.pdf: “Revisiting ` ` The Rise and Decline'' in a Population of Peer Production Projects”, Nathan TeBlunthuis, Aaron Shaw, Benjamin Mako Hill