newsletter/2016/06 (Link Bibliography)

“newsletter/​2016/​06” links:


  2. 05

  3. Changelog


  5. Hunter

  6. Statistical-notes#inferring-mean-iqs-from-smpytip-elite-samples

  7. Newton

  8. 2016-belsky.pdf: ⁠, Daniel W. Belsky, Terrie E. Moffitt, David L. Corcoran, Benjamin Domingue, HonaLee Harrington, Sean Hogan, Renate Houts, Sandhya Ramrakha, Karen Sugden, Benjamin S. Williams, Richie Poulton, Avshalom Caspi (2016-06-01; genetics  /​ ​​ ​correlation):

    A previous (GWAS) of more than 100,000 individuals identified molecular-genetic predictors of educational attainment.

    We undertook in-depth life-course investigation of the polygenic score derived from this GWAS using the 4-decade Dunedin Study (N = 918). There were 5 main findings.

    1. polygenic scores predicted adult economic outcomes even after accounting for educational attainments.
    2. genes and environments were correlated: Children with higher polygenic scores were born into better-off homes.
    3. children’s polygenic scores predicted their adult outcomes even when analyses accounted for their social-class origins; social-mobility analysis showed that children with higher polygenic scores were more upwardly mobile than children with lower scores.
    4. polygenic scores predicted behavior across the life course, from early acquisition of speech and reading skills through geographic mobility and mate choice and on to financial planning for retirement.
    5. polygenic-score associations were mediated by psychological characteristics, including intelligence, self-control, and interpersonal skill. were small.

    Factors connecting GWAS sequence with life outcomes may provide targets for interventions to promote population-wide positive development.

    [Keywords: genetics, behavior genetics, intelligence, personality, adult development]


  10. 2014-barnes.pdf

  11. ⁠, Rimfeld, Kaili Kovas, Yulia Dale, Philip S. Plomin, Robert (2015):

    Research has shown that genes play an important role in educational achievement. A key question is the extent to which the same genes affect different academic subjects before and after controlling for general intelligence. The present study investigated genetic and environmental influences on, and links between, the various subjects of the age-16 UK-wide standardized GCSE (General Certificate of Secondary Education) examination results for 12,632 twins. Using the twin method that compares identical and non-identical twins, we found that all GCSE subjects were substantially heritable, and that various academic subjects correlated substantially both phenotypically and genetically, even after controlling for intelligence. Further evidence for pleiotropy in academic achievement was found using a method based directly on DNA from unrelated individuals. We conclude that performance differences for all subjects are highly heritable at the end of compulsory education and that many of the same genes affect different subjects independent of intelligence.



  14. 2016-bagnall.pdf: ⁠, Richard D. Bagnall, Robert G. Weintraub, Jodie Ingles, Johan Duflou, Laura Yeates, Lien Lam, Andrew M. Davis, Tina Thompson, Vanessa Connell, Jennie Wallace, Charles Naylor, Jackie Crawford, Donald R. Love, Lavinia Hallam, Jodi White, Christopher Lawrence, Matthew Lynch, Natalie Morgan, Paul James, Desirée du Sart, Rajesh Puranik, Neil Langlois, Jitendra Vohra, Ingrid Winship, John Atherton, Julie McGaughran, Jonathan R. Skinner, Christopher Semsarian (2016-06-23; genetics  /​ ​​ ​heritable):

    Background: Sudden cardiac death among children and young adults is a devastating event. We performed a prospective, population-based, clinical and genetic study of sudden cardiac death among children and young adults.

    Methods: We prospectively collected clinical, demographic, and autopsy information on all cases of sudden cardiac death among children and young adults 1 to 35 years of age in Australia and New Zealand from 2010 through 2012. In cases that had no cause identified after a comprehensive autopsy that included toxicologic and histologic studies (unexplained sudden cardiac death), at least 59 cardiac genes were analyzed for a clinically relevant cardiac gene mutation.

    Results: A total of 490 cases of sudden cardiac death were identified. The annual incidence was 1.3 cases per 100,000 persons 1 to 35 years of age; 72% of the cases involved boys or young men. Persons 31 to 35 years of age had the highest incidence of sudden cardiac death (3.2 cases per 100,000 persons per year), and persons 16 to 20 years of age had the highest incidence of unexplained sudden cardiac death (0.8 cases per 100,000 persons per year). The most common explained causes of sudden cardiac death were coronary artery disease (24% of cases) and inherited cardiomyopathies (16% of cases). Unexplained sudden cardiac death (40% of cases) was the predominant finding among persons in all age groups, except for those 31 to 35 years of age, for whom coronary artery disease was the most common finding. Younger age and death at night were independently associated with unexplained sudden cardiac death as compared with explained sudden cardiac death. A clinically relevant cardiac gene mutation was identified in 31 of 113 cases (27%) of unexplained sudden cardiac death in which genetic testing was performed. During follow-up, a clinical diagnosis of an inherited cardiovascular disease was identified in 13% of the families in which an unexplained sudden cardiac death occurred.

    Conclusions: The addition of genetic testing to autopsy investigation substantially increased the identification of a possible cause of sudden cardiac death among children and young adults.

  15. 2010-wade.pdf


  17. ⁠, Li, Jingmei Foo, Jia Nee Schoof, Nils Varghese, Jajini S. Fernandez-Navarro, Pablo Gierach, Gretchen L. Quek, Swee Tian Hartman, Mikael Nord, Silje Kristensen, Vessela N. Pollán, Marina Figueroa, Jonine D. Thompson, Deborah J. Li, Yi Khor, Chiea Chuen Humphreys, Keith Liu, Jianjun Czene, Kamila Hall, Per (2013):

    Background: Individual differences in breast size are a conspicuous feature of variation in human females and have been associated with fecundity and advantage in selection of mates. To identify common variants that are associated with breast size, we conducted a large-scale genotyping association in 7169 women of European descent across three independent sample collections with digital or screen film mammograms.

    Methods: The samples consisted of the Swedish KARMA, LIBRO-1 and SASBAC studies genotyped on iCOGS, a custom illumina iSelect genotyping array comprising of 211 155 single nucleotide polymorphisms (SNPs) designed for replication and fine mapping of common and rare variants with relevance to breast, ovary and prostate cancer. Breast size of each subject was ascertained by measuring total breast area (mm(2)) on a mammogram.

    Results: We confirm genome-wide statistically-significant associations at 8p11.23 (rs10086016, p = 1.3×10(-14)) and report a new locus at 22q13 (rs5995871, p = 3.2×10−8). The latter region contains the MKL1 gene, which has been shown to impact endogenous oestrogen receptor α transcriptional activity and is recruited on oestradiol sensitive genes. We also replicated previous genome-wide association study findings for breast size at four other loci.

    Conclusions: A new locus at 22q13 may be associated with female breast size.

  18. ⁠, Dalton Conley, Thomas Laidley, Daniel W. Belsky, Jason M. Fletcher, Jason D. Boardman, Benjamin W. Domingue (2016-05-31; genetics  /​ ​​ ​selection  /​ ​​ ​dysgenics):

    We describe dynamics in mating and fertility patterns by polygenic scores associated with anthropometric traits, depression, and educational attainment across birth cohorts from 1920 to 1955. We find that, for example, increases in assortative mating at the phenotypic level for education are not matched at the genotypic level. We also show that genes related to height are positively associated with fertility and that, despite a widening gap between the more and less educated with respect to fertility, there is no evidence that this trend is associated with genes. These findings are important to our understanding of the roots of shifting distributions of health and behavior across generations in US society.

    This study asks two related questions about the shifting landscape of marriage and reproduction in US society over the course of the last century with respect to a range of health and behavioral phenotypes and their associated genetic architecture: (1) Has assortment on measured genetic factors influencing reproductive and social fitness traits changed over the course of the 20th century? (2) Has the genetic covariance between fitness (as measured by total fertility) and other traits changed over time? The answers to these questions inform our understanding of how the genetic landscape of American society has changed over the past century and have implications for population trends. We show that husbands and wives carry similar loadings for genetic factors related to education and height. However, the magnitude of this similarity is modest and has been fairly consistent over the course of the 20th century. This consistency is particularly notable in the case of education, for which phenotypic similarity among spouses has increased in recent years. Likewise, changing patterns of the number of children ever born by phenotype are not matched by shifts in genotype-fertility relationships over time. Taken together, these trends provide no evidence that social sorting is becoming increasingly genetic in nature or that dysgenic dynamics have accelerated.

    [Keywords: assortative mating, fertility, polygenic scores, cohort trends]










  28. ⁠, Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos (2016-06-06):

    We consider an agent’s uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular ⁠. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma’s Revenge.


  30. ⁠, Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen (2016-06-10):

    We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks () framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.

  31. ⁠, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (2016-06-12):

    This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/​​​​absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

  32. ⁠, Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, Alexander Lerchner (2016-06-17):

    Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentangled factors. Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of “objectness”.

  33. ⁠, Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell (2016-06-15):

    Learning to solve complex sequences of tasks–while both leveraging transfer and avoiding catastrophic forgetting–remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.

  34. ⁠, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov (2015-11-19):

    The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent. Towards this goal, we define a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains. This method, termed “Actor-Mimic”, exploits the use of deep reinforcement learning and model compression techniques to train a single policy network that learns how to act in a set of distinct tasks by using the guidance of several expert teachers. We then show that the representations learnt by the deep policy network are capable of generalizing to new tasks with no expert guidance, speeding up learning in novel environments. Although our method can in general be applied to a wide range of problems, we use Atari games as a testing environment to demonstrate these methods.

  35. ⁠, Adam H. Marblestone, Greg Wayne, Konrad P. Kording (2016-08-22):

    Neuroscience has focused on the detailed implementation of computation, studying neural codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a cost function, often using simple and relatively uniform initial architectures. Two recent developments have emerged within machine learning that create an opportunity to connect these seemingly divergent perspectives. First, structured architectures are used, including dedicated systems for attention, recursion and various forms of short-term and long-term memory storage. Second, cost functions and training procedures have become more complex and are varied across layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that (1) the brain optimizes cost functions, (2) the cost functions are diverse and differ across brain locations and over development, and (3) optimization operates within a pre-structured architecture matched to the computational problems posed by behavior. In support of these hypotheses, we argue that a range of implementations of credit assignment through multiple layers of neurons are compatible with our current knowledge of neural circuitry, and that the brain’s specialized systems can be interpreted as enabling efficient optimization for specific problem classes. Such a heterogeneously optimized system, enabled by a series of interacting cost functions, serves to make learning data-efficient and precisely targeted to the needs of the organism. We suggest directions by which neuroscience could seek to refine and test these hypotheses.

  36. ⁠, Alex A. Alemi, Francois Chollet, Niklas Een, Geoffrey Irving, Christian Szegedy, Josef Urban (2016-06-14):

    We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.

  37. ⁠, Yağmur Güçlütürk, Umut Güçlü, Rob van Lier, Marcel A. J. van Gerven (2016-06-09):

    In this paper, we use deep neural networks for inverting face sketches to synthesize photorealistic face images. We first construct a semi-simulated dataset containing a very large number of computer-generated face sketches with different styles and corresponding face images by expanding existing unconstrained face data sets. We then train models achieving state-of-the-art results on both computer-generated sketches and hand-drawn sketches by leveraging recent advances in deep learning such as batch normalization, deep residual learning, perceptual losses and stochastic optimization in combination with our new dataset. We finally demonstrate potential applications of our models in fine arts and forensic arts. In contrast to existing patch-based approaches, our deep-neural-network-based approach can be used for synthesizing photorealistic face images by inverting face sketches in the wild.

  38. ⁠, Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas (2016-06-14):

    The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.

  39. ⁠, Sacha Epskamp, Mijke Rhemtulla, Denny Borsboom (2016-05-30):

    We introduce the network model as a formal psychometric model, conceptualizing the covariance between psychometric indicators as resulting from pairwise interactions between observable variables in a network structure. This contrasts with standard psychometric models, in which the covariance between test items arises from the influence of one or more common variables. Here, we present two generalizations of the network model that encompass latent variable structures, establishing network modeling as parts of the more general framework of (SEM). In the first generalization, we model the covariance structure of latent variables as a network. We term this framework Latent Network Modeling (LNM) and show that, with LNM, a unique structure of conditional independence relationships between latent variables can be obtained in an explorative manner. In the second generalization, the residual -covariance structure of indicators is modeled as a network. We term this generalization Residual Network Modeling (RNM) and show that, within this framework, identifiable models can be obtained in which local independence is structurally violated. These generalizations allow for a general modeling framework that can be used to fit, and compare, SEM models, network models, and the RNM and LNM generalizations. This methodology has been implemented in the free-to-use software package lvnet, which contains confirmatory model testing as well as two exploratory search algorithms: stepwise search algorithms for low-dimensional datasets and penalized estimation for larger datasets. We show in simulation studies that these search algorithms performs adequately in identifying the structure of the relevant residual or latent networks. We further demonstrate the utility of these generalizations in an empirical example on a personality inventory dataset.

  40. ⁠, Jacob Westfall, Tal Yarkoni (2016-03-17):

    Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (un)reliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest—in some cases approaching 100%—when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http:    /​ ​​ ​​ ​​ ​    /​ ​​ ​​ ​​ ​    /​ ​​ ​​ ​​ ​ivy    /​ ​​ ​​ ​​ ​) that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity.

  41. 1936-stouffer.pdf: ⁠, Samuel A. Stouffer (1936; statistics):

    It is not generally recognized that such an analysis [using regression] assumes that each of the variables is perfectly measured, such that a second measure X’i, of the variable measured by Xi, has a correlation of unity with Xi. If some of the measures are more accurate than others, the analysis is impaired [by ]. For example, the sociologist may have a problem in which an index of economic status and an index of nativity are independent variables. What is the effect, if the index of economic status is much less satisfactory than the index of nativity? Ordinarily, the effect will be to underestimate the [coefficient] of the less adequately measured variable and to overestimate the [coefficient] of the more adequately measured variable.

    If either the reliability or validity of an index is in question, at least two measures of the variable are required to permit an evaluation. The purpose of this paper is to provide a logical basis and a simple arithmetical procedure (a) for measuring the effect of the use of 2 indexes, each of one or more variables, in partial and multiple correlation analysis and (b) for estimating the likely effect if 2 indexes, not available, could be secured.

  42. 1942-thorndike.pdf: “85_1.tif”⁠, Robert L. Thorndike

  43. 1965-kahneman.pdf: “Control of spurious association and the reliability of the controlled variable”⁠, Daniel Kahneman




  47. Regression

  48. ⁠, Stuart D. Ritchie, Timothy D. Bates, Ian J. Deary (2015):

    Previous research has indicated that education influences cognitive development, but it is unclear what, precisely, is being improved. Here, we tested whether education is associated with cognitive test score improvements via domain-general effects on general cognitive ability (g), or via domain-specific effects on particular cognitive skills. We conducted structural equation modeling on data from a large (n = 1,091), longitudinal sample, with a measure of intelligence at age 11 years and 10 tests covering a diverse range of cognitive abilities taken at age 70. Results indicated that the association of education with improved cognitive test scores is not mediated by g, but consists of direct effects on specific cognitive skills. These results suggest a decoupling of educational gains from increases in general intellectual capacity.


  50. ⁠, Agrawal, S. S Ray, R. S (2012):

    The use of tobacco products as dentifrices is still prevalent in various parts of India. Tobacco use in dentifrices is a terrible scourge which motivates continued use despite its harmful effects. Indian legislation prohibits the use of nicotine in dentifrices. Nicotine is primarily injurious to people because it is responsible for tobacco addiction and is dependence forming. The present study was motivated by an interest in examining the presence of nicotine in these dentifrices. Our earlier report indicates the presence of nicotine in toothpowders. To further curb the menace of tobacco, our team again analysed the toothpowder brands of previous years and in toothpastes as well. Eight brands of commonly used toothpastes and toothpowders were evaluated by gas chromatography-mass spectroscopy. On the whole, there are a few successes but much remains to be done. Our findings indicated the presence of nicotine in two brands of dant manjans and four brands of toothpastes. Further our finding underscores the need for stringent regulations by the regulatory authorities for preventing the addition of nicotine in these dentifrices. Hence government policy needs to be targeted towards an effective control of tobacco in these dentifrices and should be properly addressed.

  51. 2016-gubby.pdf: ⁠, Robin Gubby, David Wade, David Hoffer (2016-05-31; economics):

    Approximately 30 satellite launches are insured each year, and insurance coverage is provided for about 200 in-orbit satellites. The total insured exposure for these risks is currently in excess of US$25 billion. Commercial communications satellites in geostationary Earth orbit represent the majority of these, although a larger number of commercial imaging satellites, as well as the second-generation communication constellations, will see the insurance exposure in low Earth orbit start to increase in the years ahead, from its current level of US$1.5 billion. Regulations covering Lloyd’s of London syndicates require that each syndicate reserves funds to cover potential losses and to remain solvent. New regulations under the European Union’s Solvency II directive now require each syndicate to develop models for the classes of insurance provided to determine their own solvency capital requirements. Solvency II is expected to come into force in 2016 to ensure improved consumer protection, modernized supervision, deepened EU market integration, and increased international competitiveness of EU insurers. For each class of business, the inputs to the solvency capital requirements are determined not just on previous results, but also to reflect extreme cases where an unusual event or sequence of events exposes the syndicate to its theoretical worst-case loss. To assist syndicates covering satellites to reserve funds for such extreme space events, a series of realistic disaster scenarios (RDSs) has been developed that all Lloyd’s syndicates insuring space risks must report upon on a quarterly basis. The RDSs are regularly reviewed for their applicability and were recently updated to reflect changes within the space industry to incorporate such factors as consolidation in the supply chain and the greater exploitation of low Earth orbit. The development of these theoretical RDSs will be overviewed along with the limitations of such scenarios. Changes in the industry that have warranted the recent update of the RDS, and the impact such changes have had will also be outlined. Finally, a look toward future industry developments that may require further amendments to the RDSs will also be covered by the article.



  54. index.html










  64. 1993-subotnik-geniusrevisited.pdf

  65. #gwern-hunter

  66. Movies#bridge-of-spies

  67. Anime#basilisk-kouga-ninpou-chou

  68. Anime#tonari-no-seki-kun

  69. Anime#michiko-to-hatchin