“6 cloned horses help rider win prestigious polo match”; “The Clones of Polo—Adolfo Cambiaso interview” (I had no idea horse cloning had developed to this extent. Cambiaso, polo champion, has cloned his best horse 14 times now, 10 more due ~2018, and is winning polo championships on all-clone teams. He says they perform well, similar to the original, despite how “‘Every scientist that deals with epigenetics told me this would never work’, says Meeker, referring to the fact that the environment modifies gene activity and helps explain why identical twins aren’t truly identical.” Quite a case study of the benefits of cloning elite individuals.)
“Science is Getting Us Closer to the End of Infertility” (Towards massive embryo selection & IES: “Hayashi…guesses it will take 5 years to produce egg-like cells from other human cells…Adashi is less sure of the timing than he is of the outcome. ‘I don’t think any of us can say how long’, he says. ‘But the progress in rodents was remarkable: In 6 years, we went from nothing to everything. To suggest that this won’t be possible in humans is naive.’”)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
Joel Spolsky in 2002 identified a major pattern in technology business & economics: the pattern of “commoditizing your complement”, an alternative to vertical integration, where companies seek to secure a chokepoint or quasi-monopoly in products composed of many necessary & sufficient layers by dominating one layer while fostering so much competition in another layer above or below its layer that no competing monopolist can emerge, prices are driven down to marginal costs elsewhere in the stack, total price drops & increases demand, and the majority of the consumer surplus of the final product can be diverted to the quasi-monopolist. A classic example is the commodification of PC hardware by the Microsoft OS monopoly, to the detriment of IBM & benefit of MS.
This pattern explains many otherwise odd or apparently self-sabotaging ventures by large tech companies into apparently irrelevant fields, such as the high rate of releasing open-source contributions by many Internet companies or the intrusion of advertising companies into smartphone manufacturing & web browser development & statistical software & fiber-optic networks & municipal WiFi & radio spectrum auctions & DNS (Google): they are pre-emptive attempts to commodify another company elsewhere in the stack, or defenses against it being done to them.
It can be hard to see the gradual improvement of most goods over time, but I think one way to get a handle on them is to look at their downstream effects: all the small ordinary everyday things which nevertheless depend on obscure innovations and improving cost-performance ratios and gradually dropping costs and new material and… etc. All of these gradually drop the cost, drop the price, improve the quality at the same price, remove irritations or limits not explicitly noticed, or so on.
It all adds up.
So here is a personal list of small ways in which my ordinary everyday daily life has been getting better since the late ’80s/early ’90s (as far back as I can clearly remember these things—I am sure the list of someone growing up in the 1940s would include many hassles I’ve never known at all).
Cloning is widely used in animal & plant breeding despite steep costs due to its advantages; more unusual recent applications include creating entire polo horse teams and reported trials of cloning in elite police/Special Forces war dogs. Given the cost of dog cloning, however, can this ever make more sense than standard screening methods for selecting from working dog breeds, or would the increase in successful dog training be too low under all reasonable models to turn a profit?
I model the question as one of expected cost per dog with the trait of successfully passing training, success in training being a dichotomous liability threshold with a polygenic genetic architecture; given the extreme level of selection possible in selecting the best among already-elite Special Forces dogs and a range of heritabilities, this predicts clones’ success probabilities. To approximate the relevant parameters, I look at some reported training costs and success rates for regular dog candidates, broad dog heritabilities, and the few current dog cloning case studies reported in the media.
Since none of the relevant parameters are known with confidence, I run the cost-benefit equation for many hypothetical scenarios, and find that in a large fraction of them covering most plausible values, dog cloning would improve training yields enough to be profitable (in addition to its other advantages).
As further illustration of the use-case of screening for an extreme outcome based on a partial predictor, I consider the question of whether height PGSes could be used to screen the US population for people of NBA height, which turns out to be reasonably doable with current & future PGSes.
Genomic selection—the prediction of breeding values using DNA polymorphisms—is a disruptive method that has widely been adopted by animal and plant breeders to increase crop, forest and livestock productivity and ultimately secure food and energy supplies. It improves breeding schemes in different ways, depending on the biology of the species and genotyping and phenotyping constraints. However, both genomic selection and classical phenotypic selection remain difficult to implement because of the high genotyping and phenotyping costs that typically occur when selecting large collections of individuals, particularly in early breeding generations. To specifically address these issues, we propose a new conceptual framework called phenomic selection, which consists of a prediction approach based on low-cost and high-throughput phenotypic descriptors rather than DNA polymorphisms. We applied phenomic selection on two species of economic interest (wheat and poplar) using near-infrared spectroscopy on various tissues. We showed that one could reach accurate predictions in independent environments for developmental and productivity traits and tolerance to disease. We also demonstrated that under realistic scenarios, one could expect much higher genetic gains with phenomic selection than with genomic selection. Our work constitutes a proof of concept and is the first attempt at phenomic selection; it clearly provides new perspectives for the breeding community, as this approach is theoretically applicable to any organism and does not require any genotypic information.
Genealogies are likely the first, centuries-old “big data”, with their construction as old as human civilization itself. Globalization, and the identity crisis that ensued, turned many to online services, building family trees and investigating connections to historical records and other family trees . An explosion has been underway since the beginning of the century in the number and usage of websites offering such genealogical services. About 130 million users combine to have created almost four billion profiles for family members across the three most popular websites of genealogy enthusiasts, Ancestry.com, MyHeritage, and Geni. More recent years have witnessed a similar rapid increase of genetic-based services that address the same need to learn about familial relationships and ancestry. These vast amounts of crowdsourced—and often crowdfunded (as users often pay for these services)—data offers ample scientific research opportunities that would otherwise require expansive collection. In a paper published today in Science, Kaplanis et al. [2, 3] introduce a genealogical dataset based on processing 86 million public Geni profiles. Armed with this crowdsourced dataset, they address fundamental research questions.
Methods for using GWAS to estimate genetic correlations between pairwise combinations of traits have produced “atlases” of genetic architecture. Genetic atlases reveal pervasive pleiotropy, and genome-wide significant loci are often shared across different phenotypes. We introduce genomic structural equation modeling (Genomic SEM), a multivariate method for analyzing the joint genetic architectures of complex traits. Using formal methods for modeling covariance structure, Genomic SEM synthesizes genetic correlations and SNP-heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to identify variants with effects on general dimensions of cross-trait liability, boost power for discovery, and calculate more predictive polygenic scores. Finally, Genomic SEM can be used to identify loci that cause divergence between traits, aiding the search for what uniquely differentiates highly correlated phenotypes. We demonstrate several applications of Genomic SEM, including a joint analysis of GWAS summary statistics from five genetically correlated psychiatric traits. We identify 27 independent SNPs not previously identified in the univariate GWASs, 5 of which have been reported in other published GWASs of the included traits. Polygenic scores derived from Genomic SEM consistently outperform polygenic scores derived from GWASs of the individual traits. Genomic SEM is flexible, open ended, and allows for continuous innovations in how multivariate genetic architecture is modeled.
Social genetic effects (SGE, also called indirect genetic effects) are associations between genotypes of one individual and phenotype of another. SGE arise when two individuals interact and heritable traits of one influence the phenotype of the other. Recent studies have shown that SGE substantially contribute to phenotypic variation in humans and laboratory mice, which suggests that SGE, like direct genetic effects (DGE, effects of an individual’s genes on their own phenotype), are amenable to mapping. Using 170 phenotypes including behavioural, physiological and morphological traits measured in outbred laboratory mice, we empirically explored the potential and challenges of genome-wide association study of SGE (sgeGWAS) as a tool to discover novel mechanisms of social effects between unrelated individuals. For each phenotype we performed sgeGWAS, identifying 21 genome-wide significant SGE associations for 17 phenotypes, and dgeGWAS for comparison. Our results provide three main insights: first, SGE and DGE arise from partially different loci and/or loci with different effect sizes, which implies that the widely-studied mechanism of phenotypic “contagion” is not sufficient to explain all social effects. Secondly, several DGE associations but no SGE associations had large effects, suggesting sgeGWAS is unlikely to uncover “low hanging fruits”. Finally, a similar number of variants likely contribute to SGE and DGE. The analytical framework we developed in this study and the insights we gained from our analyses will inform the design, implementation and interpretation of sgeGWAS in this and other populations and species.
Sleep is an essential homeostatically-regulated state of decreased activity and alertness conserved across animal species, and both short and long sleep duration associate with chronic disease and all-cause mortality1,2. Defining genetic contributions to sleep duration could point to regulatory mechanisms and clarify causal disease relationships. Through genome-wide association analyses in 446,118 participants of European ancestry from the UK Biobank, we discover 78 loci for self-reported sleep duration that further impact accelerometer-derived measures of sleep duration, daytime inactivity duration, sleep efficiency and number of sleep bouts in a subgroup (n=85,499) with up to 7-day accelerometry. Associations are enriched for genes expressed in several brain regions, and for pathways including striatum and subpallium development, mechanosensory response, dopamine binding, synaptic neurotransmission, catecholamine production, synaptic plasticity, and unsaturated fatty acid metabolism. Genetic correlation analysis indicates shared biological links between sleep duration and psychiatric, cognitive, anthropometric and metabolic traits and Mendelian randomization highlights a causal link of longer sleep with schizophrenia.
“Genome-wide association analyses of chronotype in 697,828 individuals provides new insights into circadian rhythms in humans and links to disease”, Samuel E. Jones, Jacqueline M. Lane, Andrew R. Wood, Vincent T. van Hees, Jessica Tyrrell, Robin N. Beaumont, Aaron R. Jeffries, Hassan S. Dashti, Melvyn Hillsdon, Katherine S. Ruth, Marcus A. Tuke, Hanieh Yaghootkar, Seth A. Sharp, Yingjie Ji, James W. Harrison, Amy Dawes, Enda M. Byrne, Henning Tiemeier, Karla V. Allebrandt, Jack Bowden, David W. Ray, Rachel M. Freathy, Anna Murray, Diego R. Mazzotti, Philip R. Gehrman, the 23andMe Research Team, Deborah A. Lawlor, Timothy M. Frayling, Martin K. Rutter, David A. Hinds, Richa Saxena, Michael N. Weedon (2018-04-19):
Using data from 697,828 research participants from 23andMe and UK Biobank, we identified 351 loci associated with being a morning person, a behavioural indicator of a person’s underlying circadian rhythm. These loci were validated in 85,760 individuals with activity-monitor derived measures of sleep timing: the mean sleep timing of the 5% of individuals carrying the most “morningness” alleles was 25.1 minutes (95% CI: 22.5, 27.6) earlier than the 5% carrying the fewest. The loci were enriched for genes involved in circadian rhythm and insulin pathways, and those expressed in the retina, hindbrain, hypothalamus, and pituitary (all FDRw<1%). We provide some evidence that being a morning person was causally associated with reduced risk of schizophrenia (OR: 0.89; 95% CI: 0.82, 0.96), depression (OR: 0.94; 95% CI: 0.91, 0.98) and a lower age at last childbirth in women (β: -046 years; 95% CI: -0.067, -0.025), but was not associated with BMI (β: -4.6×10−4; 95% CI: -0.044, 0.043) or type 2 diabetes (OR: 1.00; 95% CI: 0.91, 1.1). This study offers new insights into the biology of circadian rhythms and disease links in humans.
Different human populations facing similar environmental challenges have sometimes evolved convergent biological adaptations, for example hypoxia resistance at high altitudes and depigmented skin in northern latitudes on separate continents. The pygmy phenotype (small adult body size), a characteristic of hunter-gatherer populations inhabiting both African and Asian tropical rainforests, is often highlighted as another case of convergent adaptation in humans. However, the degree to which phenotypic convergence in this polygenic trait is due to convergent vs. population-specific genetic changes is unknown. To address this question, we analyzed high-coverage sequence data from the protein-coding portion of the genomes (exomes) of two pairs of populations, Batwa rainforest hunter-gatherers and neighboring Bakiga agriculturalists from Uganda, and Andamanese rainforest hunter-gatherers (Jarawa and Onge) and Brahmin agriculturalists from India. We observed signatures of convergent positive selection between the Batwa and Andamanese rainforest hunter-gatherers across the set of genes with annotated ‘growth factor binding’ functions (p< 0.001). Unexpectedly, for the rainforest groups we also observed convergent and population-specific signatures of positive selection in pathways related to cardiac development (e.g. ‘cardiac muscle tissue development’; p = 0.003). We hypothesize that the growth hormone sub-responsiveness likely underlying the pygmy phenotype may have led to compensatory changes in cardiac pathways, in which this hormone also plays an essential role. Importantly, we did not observe similar patterns of positive selection on sets of genes associated with either growth or cardiac development in the agriculturalist populations, indicating that our results most likely reflect a history of convergent adaptation to the similar ecology of rainforest hunter-gatherers rather than a more common or general evolutionary pattern for human populations.
The role played by natural selection in shaping present-day human populations has received extensive scrutiny [1, 2, 3], especially in the context of local adaptations . However, most studies to date assume, either explicitly or not, that populations have been in their current locations long enough to adapt to local conditions , and that population sizes were large enough to allow for the action of selection . If these conditions were satisfied, not only should selection be effective at promoting local adaptations, but deleterious alleles should also be eliminated over time. To assess this prediction, the genomes of 2,062 individuals, including 1,179 ancient humans, were reanalyzed to reconstruct how frequencies of risk alleles and their homozygosity changed through space and time in Europe. While the overall deleterious homozygosity consistently decreased through space and time, risk alleles have shown a steady increase in frequency. Even the mutations that are predicted to be most deleterious fail to exhibit any significant decrease in frequency. These conclusions do not deny the existence of local adaptations, but highlight the limitations imposed by drift and range expansions on the strength of selection in purging the mutational load affecting human populations.
Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator’s output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, OMNIGLOT, CELEBA) and synthetic 3D datasets. A video of the agent can be found at YouTube.
“Learning to Navigate in Cities Without a Map”, Piotr Mirowski, Matthew Koichi Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell (2018-03-31):
Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn
Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50 over the single-image model.
Objective: To assess differences in estimated treatment effects for mortality between observational studies with routinely collected health data (RCD; that are published before trials are available) and subsequent evidence from randomized controlled trials on the same clinical question.
Design: Meta-epidemiological survey.
Data sources: PubMed searched up to November 2014.
Methods: Eligible RCD studies were published up to 2010 that used propensity scores to address confounding bias and reported comparative effects of interventions for mortality. The analysis included only RCD studies conducted before any trial was published on the same topic. The direction of treatment effects, confidence intervals, and effect sizes (odds ratios) were compared between RCD studies and randomized controlled trials. The relative odds ratio (that is, the summary odds ratio of trial(s) divided by the RCD study estimate) and the summary relative odds ratio were calculated across all pairs of RCD studies and trials. A summary relative odds ratio greater than one indicates that RCD studies gave more favorable mortality results.
Results: The evaluation included 16 eligible RCD studies, and 36 subsequent published randomized controlled trials investigating the same clinical questions (with 17 275 patients and 835 deaths). Trials were published a median of three years after the corresponding RCD study. For five (31%) of the 16 clinical questions, the direction of treatment effects differed between RCD studies and trials. Confidence intervals in nine (56%) RCD studies did not include the RCT effect estimate. Overall, RCD studies showed significantly more favorable mortality estimates by 31% than subsequent trials (summary relative odds ratio 1.31 (95% confidence interval 1.03 to 1.65; I2=0%)).
Conclusions: Studies of routinely collected health data could give different answers from subsequent randomized controlled trials on the same clinical questions, and may substantially overestimate treatment effects. Caution is needed to prevent misguided clinical decision making.
Statistical folklore asserts that “everything is correlated”: in any real-world dataset, most or all measured variables will have non-zero correlations, even between variables which appear to be completely independent of each other, and that these correlations are not merely sampling error flukes but will appear in large-scale datasets to arbitrarily designated levels of statistical-significance or posterior probability.
This raises serious questions for null-hypothesis statistical-significance testing, as it implies the null hypothesis of 0 will always be rejected with sufficient data, meaning that a failure to reject only implies insufficient data, and provides no actual test or confirmation of a theory. Even a directional prediction is minimally confirmatory since there is a 50% chance of picking the right direction at random.
It also has implications for conceptualizations of theories & causal models, interpretations of structural models, and other statistical principles such as the “sparsity principle”.
Plants have physical and chemical mechanisms for defense from attack by animals. Phytochemical defenses that protect plants from attack by insects include antifeedants, insecticides, and insect growth regulators. Phytochemical options exist by which plants can modulate the fertility of the other major group of plant predators, vertebrate herbivores, and thereby reduce cumulative attacks by those herbivores. The success of such a defense depends upon phytochemical mimicry of vertebrate reproductive hormones. Phytoestrogens do mimic reproductive hormones and are proposed to be defensive substances produced by plants to modulate the fertility of herbivores.
Francisco Muniz IV is an American actor, race-car driver and musician. He is best known for playing the title character in the Fox television family sitcom Malcolm in the Middle, which earned him an Emmy Award nomination and two Golden Globe Award nominations. He is also known for his film roles in Deuces Wild (2002), Big Fat Liar (2002), Agent Cody Banks (2003), and Racing Stripes (2005). At the height of his fame, he was considered one of the most popular child actors and in 2003, "one of Hollywood's most bankable teens". In 2008, Muniz put his acting career on hold to pursue an open-wheel racing career, and competed in the Atlantic Championship. From 2012 to 2014, he was a drummer in the band Kingsfoil.
Malcolm in the Middle is an American television sitcom created by Linwood Boomer for Fox. The series premiered on January 9, 2000, and ended its six-year run on May 14, 2006, after seven seasons and 151 episodes. The series received critical acclaim and won a Peabody Award, seven Emmy Awards, one Grammy Award and seven Golden Globe nominations.
"Pangur Bán" is an Old Irish poem, written about the 9th century at or near Reichenau Abbey by an Irish monk about his cat. Pangur Bán, 'White Pangur', is the cat's name, Pangur meaning 'a fuller'. Although the poem is anonymous, it bears similarities to the poetry of Sedulius Scottus, prompting speculation that he is the author. In eight verses of four lines each, the author compares the cat's happy hunting with his own scholarly pursuits.
A Quiet Place is a 2018 American horror film directed by and starring John Krasinski. Written by Bryan Woods, Scott Beck and Krasinski, the plot revolves around a father (Krasinski) and a mother who struggle to survive and raise their children in a post-apocalyptic world inhabited by blind extraterrestrial monsters with an acute sense of hearing.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
Analogous to the dog cloning scenario, I consider the case of selecting for extremes on PGSes, motivated by a scenario of scouting tall men for the NBA.
Setting up the NBA selection problem as a liability threshold model with current height PGSes as a noisy predictor, height selection can be modeled as selecting for extremes on a PGS which is regressed back to the mean to yield expected adult height, and probability of being tall enough to consider a NBA career.
Filling in reasonable values, nontrivial numbers of tall people can be found by genomic screening with a current PGS, and as PGSes approach their predictive upper bound (derived from whole-genome-based heritability estimates of height), selection is capable of selecting almost all tall people by taking the top PGS percentile.