- Directories
- Links
- “MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits”, Runcie et al 2021
- “A parsimonious model for mass-univariate vertex-wise analysis”, Couvy-Duchesne et al 2021
- “Ensemble Learning of Convolutional Neural Network, Support Vector Machine, and Best Linear Unbiased Predictor for Brain Age Prediction: ARAMIS Contribution to the Predictive Analytics Competition 2019 Challenge”, Couvy-Duchesne et al 2020
- “Exploring the variance in complex traits captured by DNA methylation assays”, Battram et al 2020
- “A unified framework for association and prediction from vertex-wise grey-matter structure”, Couvy-Duchesne et al 2020
- “On Attenuated Interactions, Measurement Error, and Statistical Power: Guidelines for Social and Personality Psychologists”, Blake & Gangestad 2020
- “Using high-throughput phenotypes to enable genomic selection by inferring genotypes”, Whalen et al 2020
- “Randomized Experiments in Education, with Implications for Multilevel Causal Inference”, Raudenbush & Schwartz 2020
- “Analysis of variance when both input and output sets are high-dimensional”, Campos et al 2020
- “In-field whole plant maize architecture characterized by Latent Space Phenotyping”, Gage et al 2019
- “Widespread associations between grey matter structure and the human phenome”, Couvy-Duchesne et al 2019
- “Latent Space Phenotyping: Automatic Image-Based Phenotyping for Treatment Studies”, Ubbens et al 2019
- “Predicting human inhibitory control from brain structural MRI”, He et al 2019
- “Statistical Aspects of Wasserstein Distances”, Panaretos & Zemel 2019
- “Finite Mixture Models”, McLachlan et al 2019
- “Raincloud plots: a multi-platform tool for robust data visualization”, Allen et al 2018
- “Phenomic selection: a low-cost and high-throughput alternative to genomic selection”, Rincent et al 2018
- “Intact Connectional Morphometricity Learning Using Multi-view Morphological Brain Networks with Application to Autism Spectrum Disorder”, Bessadok & Rekik 2018
- “Environmental factors dominate over host genetics in shaping human gut microbiota composition”, Rothschild et al 2017
- “PinMe_PrateekCommentsIncluded_Aug_21.pdf”, arsalan 2017
- “Morphometricity as a measure of the neuroanatomical signature of a trait”, Sabuncu et al 2016
- “Exploring Factor Model Parameters across Continuous Variables with Local Structural Equation Models”, Hildebrandt et al 2016
- “Do scholars follow Betteridge’s Law? The use of questions in journal article titles”, Cook & Plourde 2016
- “Low-dose paroxetine exposure causes lifetime declines in male mouse body weight, reproduction and competitive ability as measured by the novel organismal performance assay”, Ruff et al 2015
- “
*Past, Present, and Future of Statistical Science*[COPSS 50^{th}anniversary anthology]”, Lin et al 2014 - “Psychological Measurement and Methodological Realism”, Hood 2013
- “Confidence Intervals for the Weighted Sum of Two Independent Binomial Proportions”, Decrouez & Robinson 2012
- “The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost”, Herculano-Houzel 2012
- “The Changing History of Robustness”, Stigler 2010
- “A new car-following model yielding log-normal type headways distributions”, 力) et al 2010 {#力)-et-al-2010-section}
- “Continuous Parameter Estimation Model: Expanding the Standard Statistical Paradigm”, Gorsuch 2005
- “William Sealy Gosset”, Fienberg & Lazar 2001
- “There Is a Time and a Place for Significance Testing”, Mulaik et al 1997
- “THE ENTROPY OF ENGLISH USING PPM-BASED MODELS - Data Compression Conference, 1996. DCC '96. Proceedings”
- “Error rates in quadratic discrimination with constraints on the covariance matrices”, Flury et al 1994
- “Statistics as Rhetoric in Psychology”, John 1992
- “Another comment on O'Cinneide”, Mallows 1991
- “The Mean is within One Standard Deviation of Any Median”, O'cinneide 1990 {#o’cinneide-1990-section}
- “1_1.tif”
- “Essence of Statistics (Second Edition)”, Loftus & Loftus 1982
- “Statistics in Britain 1865-1930: The Social Construction of Scientific Knowledge”, MacKenzie & A. 1981
- “On Rereading R. A. Fisher [Fisher Memorial lecture, with comments]”, Savage et al 1976
- “Theory Confirmation in Psychology”, Swoyer & Monson 1975
- “On the alleged falsity of the null hypothesis”, Oakes 1975
- “On Prior Probabilities of Rejecting Statistical Hypotheses”, Keuth 1973
- “Control of spurious association and the reliability of the controlled variable”, Kahneman 1965
- “Social Statistics”, Blalock & Jr. 1960
- “'Student' and Small Sample Theory”, Welch 1958
- “The Influence of 'Statistical Methods for Research Workers' on the Development of the Science of Statistics”, Yates 1951
- “85_1.tif”, Thorndike 1942
- “On the Non-Existence of Tests of "Student's" Hypothesis Having Power Functions Independent of σ”, Dantzig 1940
- “Professor Ronald Aylmer Fisher [profile]”, Mahalanobis 1938
- “Evaluating the Effect of Inadequately Measured Variables in Partial Correlation Analysis”, Stouffer 1936
- “The Method of Path Coefficients”, Wright 1934
- “The Limits of a Measure of Skewness”, Hotelling & Solomons 1932
- “Correlation Calculated from Faulty Data”, Spearman 1910
- “Some Experimental Results in the Correlation of Mental Abilities”, Brown 1910

- Miscellaneous

# Directories

# Links

## “MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits”, Runcie et al 2021

“MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits”, (2021-07-23; backlinks):

[Previously:

`Grid-LMM`

(Runcie & Crawford 2019).] Large-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model, a tool notorious for its fragility when applied to more than a handful of traits. We present`MegaLMM`

, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using 3 examples with real plant data, we show that`MegaLMM`

can leverage thousands of traits at once to substantially improve genetic value prediction accuracy.…Here, we describe

`MegaLMM`

(linear mixed models for millions of observations), a novel statistical method and computational algorithm for fitting massive-scale MvLMMs to large-scale phenotypic datasets. Although we focus on plant breeding applications for concreteness, our method can be broadly applied wherever multi-trait linear mixed models are used (e.g., human genetics, industrial experiments, psychology, linguistics, etc.).`MegaLMM`

dramatically improves upon existing methods that fit low-rank MvLMMs, allowing multiple random effects and un-balanced study designs with large amounts of missing data. We achieve both scalability and statistical robustness by combining strong, but biologically motivated, Bayesian priors for statistical regularization—analogous to thep≫napproach of genomic prediction methods—with algorithmic innovations recently developed for LMMs. In the 3 examples below, we demonstrate that our algorithm maintains high predictive accuracy for tens-of-thousands of traits, and dramatically improves the prediction of genetic values over existing methods when applied to data from real breeding programs.…Together, the set of parallel univariate LMMs and the set of factor loading vectors result in a novel and very general re-parameterization of the MvLMM framework as a mixed-effect factor model. This parameterization leads to dramatic computational performance gains by avoiding all large matrix inversions. It also serves as a scaffold for eliciting Bayesian priors that are intuitive and provide powerful regularization which is necessary for robust performance with limited data. Our default prior distributions encourage: (1) shrinkage on the factor-trait correlations (

λ) to avoid over-fitting covariances, and (2) shrinkage on the factor sizes to avoid including too many latent traits. This 2-dimensional regularization helps the model focus only on the strongest, most relevant signals in the data._{jk}…

Model limitations: While`MegaLMM`

works well across a wide range of applications in breeding programs, our approach does have some limitations.First, since

`MegaLMM`

is built on the Grid-LMM framework for efficient likelihood calculations^{22}, it does not scale well to large numbers of observations (in contrast to large numbers of traits), or large numbers of random effects. As the number of observational units increases,`MegaLMM`

’s memory requirements increase quadratically because of the requirement to store sets of pre-calculated inverse-variance matrices. Similarly, for each additional random effect term included in the model, memory requirements increase exponentially. Therefore, we generally limit models to fewer than 10,000 observations [n] and only 1-to-4 random effect terms per trait. There may be opportunities to reduce this memory burden if some of the random effects are low-rank; then these random effects could be updated on the fly using efficient routines for low-rank Cholesky updates. We also do not currently suggest including regressions directly on markers and have used marker-based kinship matrices here instead for computational efficiency. Therefore as a stand-alone prediction method,`MegaLMM`

requires calculations involving the Schur complement of the joint kinship matrix of the testing and training individuals which can be computationally costly.Second,

`MegaLMM`

is inherently a linear model and cannot effectively model trait relationships that are non-linear. Some non-linear relationships between predictor variables (like genotypes) and traits can be modeled through non-linear kernel matrices, as we demonstrated with the`RKHS`

application to the Bread Wheat data. However, allowing non-linear relationships among traits is currently beyond the capacity of our software and modeling approach. Extending our mixed effect model on the low-dimensional latent factor space to a non-linear modeling structure like a neural network may be an exciting area for future research. Also, some sets of traits may not have low-rank correlation structures that are well-approximated by a factor model. For example, certain auto-regressive dependence structures are low-rank but cannot efficiently be decomposed into a discrete set of factors.Nevertheless, we believe that in its current form,

`MegaLMM`

will be useful to a wide range of researchers in quantitative genetics and plant breeding.

## “A parsimonious model for mass-univariate vertex-wise analysis”, Couvy-Duchesne et al 2021

“A parsimonious model for mass-univariate vertex-wise analysis”, (2021-01-22; backlinks):

Covariance between grey-matter measurements can reflect structural or functional brain networks though it has also been shown to be influenced by confounding factors (e.g. age, head size, scanner), which could lead to lower mapping precision (increased size of associated clusters) and create distal false positives associations in mass-univariate vertex-wise analyses.

We evaluated this concern by performing state-of-the-art mass-univariate analyses (general linear model, GLM) on traits simulated from real vertex-wise grey matter data (including cortical and subcortical thickness and surface area). We contrasted the results with those from linear mixed models (LMMs), which have been shown to overcome similar issues in omics association studies.

We showed that when performed on a large sample (

n= 8,662, UK Biobank), GLMs yielded large spatial clusters of statistically-significant vertices and greatly inflated false positive rate (Family Wise Error Rate: FWER = 1, cluster false discovery rate: FDR>0.6). We showed that LMMs resulted in more parsimonious results: smaller clusters and reduced false positive rate (yet FWER>5% after Bonferroni correction) but at a cost of increased computation. In practice, the parsimony of LMMs results from controlling for the joint effect of all vertices, which prevents local and distal redundant associations from reaching statistical-significance.Next, we performed mass-univariate association analyses on 5 real UKB traits (age, sex, BMI, fluid intelligence and smoking status) and LMM yielded fewer and more localised associations. We identified 19 statistically-significant clusters displaying small associations with age, sex and BMI, which suggest a complex architecture of at least dozens of associated areas with those phenotypes.

## “Ensemble Learning of Convolutional Neural Network, Support Vector Machine, and Best Linear Unbiased Predictor for Brain Age Prediction: ARAMIS Contribution to the Predictive Analytics Competition 2019 Challenge”, Couvy-Duchesne et al 2020

“Ensemble Learning of Convolutional Neural Network, Support Vector Machine, and Best Linear Unbiased Predictor for Brain Age Prediction: ARAMIS Contribution to the Predictive Analytics Competition 2019 Challenge”, (2020-12-15; backlinks):

We ranked third in the Predictive Analytics Competition (PAC) 2019 challenge by achieving a mean absolute error (MAE) of 3.33 years in predicting age from T1-weighted MRI brain images. Our approach combined seven algorithms that allow generating predictions when the number of features exceeds the number of observations, in particular, two versions of best linear unbiased predictor (BLUP), support vector machine (SVM), two shallow convolutional neural networks (CNNs), and the famous ResNet and Inception V1. Ensemble learning was derived from estimating weights via linear regression in a hold-out subset of the training sample. We further evaluated and identified factors that could influence prediction accuracy: choice of algorithm, ensemble learning, and features used as input/MRI image processing. Our prediction error was correlated with age, and absolute error was greater for older participants, suggesting to increase the training sample for this subgroup. Our results may be used to guide researchers to build age predictors on healthy individuals, which can be used in research and in the clinics as non-specific predictors of disease status.

[

Keywords: brain age, MRI, machine learning, deep learning, statistical learning, ensemble learning]…

Morphometricity of Age as Upper Bound of Prediction Accuracy: From BLUP models, we estimated the total association between age and the brain features. Morphometricity is expressed in proportion of the variance (R2) of age; thus, it quantifies how much of the differences in age in the sample may be attributed/associated with variation in brain structure. With surface-based processing (~650,000 vertices), we estimated the morphometricity to be R^{2}= 0.99 (SE = 0.052), while for volume-based processing (~480,000 voxels), it reached R^{2}= 0.97 (SE = 0.015).

## “Exploring the variance in complex traits captured by DNA methylation assays”, Battram et al 2020

“Exploring the variance in complex traits captured by DNA methylation assays”, (2020-10-10; backlinks):

Following years of epigenome-wide association studies (EWAS), traits analysed to date tend to yield few associations. Reinforcing this observation, we conducted EWAS on 400 traits and 16 yielded at least one association at the conventional significance threshold (

p< 1×10^{−7}). To investigate why EWAS yield is low, we formally estimated the proportion of phenotypic variation captured by 421,693 blood derived DNA methylation markers (h^{2}_{EWAS}) across all 400 traits. The meanh^{2}_{EWAS}was zero, with evidence for regular cigarette smoking exhibiting the largest association with all markers (h^{2}_{EWAS}= 0.42) and the only one surpassing a false discovery rate < 0.1. Though underpowered to determine theh^{2}_{EWAS}value for any one trait,h^{2}_{EWAS}was predictive of the number of EWAS hits across the traits analysed (AUC = 0.7). Modelling the contributions of the methylome on a per-site versus a per-region basis gave variedh^{2}_{EWAS}estimates (r= 0.47) but neither approach obtained substantially higher model fits across all traits. Our analysis indicates that most complex traits do not heavily associate with markers commonly measured in EWAS within blood. However, it is likely DNA methylation does capture variation in some traits andh^{2}_{EWAS}may be a reasonable way to prioritise traits that are likely to yield associations.

## “A unified framework for association and prediction from vertex-wise grey-matter structure”, Couvy-Duchesne et al 2020

“A unified framework for association and prediction from vertex-wise grey-matter structure”, (2020-07-20; backlinks):

The recent availability of large-scale neuroimaging cohorts facilitates deeper characterisation of the relationship between phenotypic and brain architecture variation in humans. Here, we investigate the association (previously coined morphometricity) of a phenotype with all 652,283 vertex-wise measures of cortical and subcortical morphology in a large data set from the UK Biobank (UKB;

n= 9,497 for discovery,n= 4,323 for replication) and the Human Connectome Project (n= 1,110).We used a linear mixed model with the brain measures of individuals fitted as random effects with covariance relationships estimated from the imaging data. We tested 167 behavioural, cognitive, psychiatric or lifestyle phenotypes and found statistically-significant morphometricity for 58 phenotypes (spanning substance use, blood assay results, education or income level, diet, depression, and cognition domains), 23 of which replicated in the UKB replication set or the HCP. We then extended the model for a bivariate analysis to estimate grey-matter correlation between phenotypes, which revealed that body size (ie., height, weight, BMI, waist and hip circumference, body fat percentage) could account for a substantial proportion of the morphometricity (confirmed using a conditional analysis), providing possible insight into previous MRI case-control results for psychiatric disorders where case status is associated with body mass index. Our LMM framework also allowed to predict some of the associated phenotypes from the vertex-wise measures, in two independent samples. Finally, we demonstrated additional new applications of our approach: (a) region of interest (ROI) analysis that retain the vertex-wise complexity; (b) comparison of the information retained by different MRI processings.

## “On Attenuated Interactions, Measurement Error, and Statistical Power: Guidelines for Social and Personality Psychologists”, Blake & Gangestad 2020

`2020-blake.pdf`

: “On Attenuated Interactions, Measurement Error, and Statistical Power: Guidelines for Social and Personality Psychologists”, (2020-03-25; ; backlinks):

The replication crisis has seen increased focus on best practice techniques to improve the reliability of scientific findings. What remains elusive to many researchers and is frequently misunderstood is that predictions involving interactions dramatically affect the calculation of statistical power. Using recent papers published in

Personality and Social Psychology Bulletin(PSPB), we illustrate the pitfalls of improper power estimations in studies where attenuated interactions are predicted. Our investigation shows why even a programmatic series of 6 studies employing 2×2 designs, with samples exceedingn= 500, can be woefully underpowered to detect genuine effects. We also highlight the importance of accounting for error-prone measures when estimating effect sizes and calculating power, explaining why even positive results can mislead when power is low. We then provide five guidelines for researchers to avoid these pitfalls, including cautioning against the heuristic that a series of underpowered studies approximates the credibility of one well-powered study.[

Keywords: statistical power, effect size, fertility, ovulation, interaction effects]

## “Using high-throughput phenotypes to enable genomic selection by inferring genotypes”, Whalen et al 2020

“Using high-throughput phenotypes to enable genomic selection by inferring genotypes”, (2020-03-02; backlinks):

In this paper we develop and test a method which uses high-throughput phenotypes to infer the genotypes of an individual. The inferred genotypes can then be used to perform genomic selection. Previous methods which used high-throughput phenotype data to increase the accuracy of selection assumed that the high-throughput phenotypes correlate with selection targets. When this is not the case, we show that the high-throughput phenotypes can be used to determine which haplotypes an individual inherited from their parents, and thereby infer the individual’s genotypes. We tested this method in two simulations. In the first simulation, we explored, how the accuracy of the inferred genotypes depended on the high-throughput phenotypes used and the genome of the species analysed. In the second simulation we explored whether using this method could increase genetic gain a plant breeding program by enabling genomic selection on non-genotyped individuals. In the first simulation, we found that genotype accuracy was higher if more high-throughput phenotypes were used and if those phenotypes had higher heritability. We also found that genotype accuracy decreased with an increasing size of the species genome. In the second simulation, we found that the inferred genotypes could be used to enable genomic selection on non-genotyped individuals and increase genetic gain compared to random selection, or in some scenarios phenotypic selection. This method presents a novel way for using high-throughput phenotype data in breeding programs. As the quality of high-throughput phenotypes increases and the cost decreases, this method may enable the use of genomic selection on large numbers of non-genotyped individuals.

## “Randomized Experiments in Education, with Implications for Multilevel Causal Inference”, Raudenbush & Schwartz 2020

`2020-raudenbush.pdf`

: “Randomized Experiments in Education, with Implications for Multilevel Causal Inference”, (2020-03-01):

Education research has experienced a methodological renaissance over the past two decades, with a new focus on large-scale randomized experiments. This wave of experiments has made education research an even more exciting area for statisticians, unearthing many lessons and challenges in experimental design, causal inference, and statistics more broadly. Importantly, educational research and practice almost always occur in a multilevel setting, which makes the statistics relevant to other fields with this structure, including social policy, health services research, and clinical trials in medicine. In this article we first briefly review the history that led to this new era in education research and describe the design features that dominate the modern large-scale educational experiments. We then highlight some of the key statistical challenges in this area, including endogeneity of design, heterogeneity of treatment effects, noncompliance with treatment assignment, mediation, generalizability, and spillover. Though a secondary focus, we also touch on promising trial designs that answer more nuanced questions, such as the SMART design for studying dynamic treatment regimes and factorial designs for optimizing the components of an existing treatment.

## “Analysis of variance when both input and output sets are high-dimensional”, Campos et al 2020

“Analysis of variance when both input and output sets are high-dimensional”, (2020-02-15; backlinks):

MotivationModern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns.

Results: We propose and evaluate two methods for analysis variance when both input and output sets are high-dimensional. Our approach uses random effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. We used simulations to assess the bias and variance of each of the methods, and to compare it with that of the Partial Least Squares (PLS)–an approach commonly used in multivariate-high-dimensional regressions. The MC-ANOVA method gave nearly unbiased estimates in all the simulation scenarios considered. Estimates produced by Eigen-ANOVA and PLS had noticeable biases. Finally, we demonstrate insight that can be obtained with the of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus linkage disequilibrium in chicken genomes and to the assessment of inter-dependencies between gene expression, methylation and copy-number-variants in data from breast cancer tumors.

AvailabilityThe Supplementary data includes an R-implementation of each of the proposed methods as well as the scripts used in simulations and in the real-data analyses.

Contactgustavoc@msu.edu

Supplementary informationSupplementary data are available at

Bioinformaticsonline.

## “In-field whole plant maize architecture characterized by Latent Space Phenotyping”, Gage et al 2019

“In-field whole plant maize architecture characterized by Latent Space Phenotyping”, (2019-09-10; backlinks):

Collecting useful, interpretable, and biologically relevant phenotypes in a resource-efficient manner is a bottleneck to plant breeding, genetic mapping, and genomic prediction. Autonomous and affordable sub-canopy rovers are an efficient and scalable way to generate sensor-based datasets of in-field crop plants. Rovers equipped with light detection and ranging (LiDar) can produce three-dimensional reconstructions of entire hybrid maize fields. In this study, we collected 2,103 LiDar scans of hybrid maize field plots and extracted phenotypic data from them by Latent Space Phenotyping (LSP). We performed LSP by two methods, principal component analysis (PCA) and a convolutional autoencoder, to extract meaningful, quantitative Latent Space Phenotypes (LSPs) describing whole-plant architecture and biomass distribution. The LSPs had heritabilities of up to 0.44, similar to some manually measured traits, indicating they can be selected on or genetically mapped. Manually measured traits can be successfully predicted by using LSPs as explanatory variables in partial least squares regression, indicating the LSPs contain biologically relevant information about plant architecture. These techniques can be used to assess crop architecture at a reduced cost and in an automated fashion for breeding, research, or extension purposes, as well as to create or inform crop growth models.

## “Widespread associations between grey matter structure and the human phenome”, Couvy-Duchesne et al 2019

“Widespread associations between grey matter structure and the human phenome”, (2019-07-09; backlinks):

The recent availability of large-scale neuroimaging cohorts (here the UK Biobank [UKB] and the Human Connectome Project [HCP]) facilitates deeper characterisation of the relationship between phenotypic and brain architecture variation in humans. We tested the association between 654,386 vertex-wise measures of cortical and subcortical morphology (from T1w and T2w MRI images) and behavioural, cognitive, psychiatric and lifestyle data. We found a statistically-significant association of grey-matter structure with 58 out of 167 UKB phenotypes spanning substance use, blood assay results, education or income level, diet, depression, being a twin as well as cognition domains (UKB discovery sample:

n= 9,888). Twenty-three of the 58 associations replicated (UKB replication sample:n= 4,561; HCP,n= 1,110). In addition, differences in body size (height, weight, BMI, waist and hip circumference, body fat percentage) could account for a substantial proportion of the association, providing possible insight into previous MRI case-control studies for psychiatric disorders where case status is associated with body mass index. Using the same linear mixed model, we showed that most of the associated characteristics (e.g. age, sex, body size, diabetes, being a twin, maternal smoking, body size) could be significantly predicted using all the brain measurements in out-of-sample prediction. Finally, we demonstrated other applications of our approach including a Region Of Interest (ROI) analysis that retain the vertex-wise complexity and ranking of the information contained across MRI processing options.

Highlights: Our linear mixed model approach unifies association and prediction analyses for highly dimensional vertex-wise MRI dataGrey-matter structure is associated with measures of substance use, blood assay results, education or income level, diet, depression, being a twin as well as cognition domains

Body size (height, weight, BMI, waist and hip circumference) is an important source of covariation between the phenome and grey-matter structure

Grey-matter scores quantify grey-matter based risk for the associated traits and allow to study phenotypes not collected

The most general cortical processing (“fsaverage” mesh with no smoothing) maximises the brain-morphometricity for all UKB phenotypes

## “Latent Space Phenotyping: Automatic Image-Based Phenotyping for Treatment Studies”, Ubbens et al 2019

“Latent Space Phenotyping: Automatic Image-Based Phenotyping for Treatment Studies”, (2019-04-15; backlinks):

Association mapping studies have enabled researchers to identify candidate loci for many important environmental resistance factors, including agronomically relevant resistance traits in plants. However, traditional genome-by-environment studies such as these require a phenotyping pipeline which is capable of accurately and consistently measuring stress responses, typically in an automated high-throughput context using image processing. In this work, we present Latent Space Phenotyping (LSP), a novel phenotyping method which is able to automatically detect and quantify response-to-treatment directly from images. Using two synthetically generated image datasets, we first show that LSP is able to successfully recover the simulated QTL in both simple and complex synthetic imagery. We then demonstrate an example application of an interspecific cross of the model C

_{4}grassSetaria. We propose LSP as an alternative to traditional image analysis methods for phenotyping, enabling association mapping studies without the need for engineering complex image processing pipelines.

## “Predicting human inhibitory control from brain structural MRI”, He et al 2019

`2019-he.pdf`

: “Predicting human inhibitory control from brain structural MRI”, Ningning He, Edmund T. Rolls, Wei Zhao, Shuixia Guo (backlinks)

## “Statistical Aspects of Wasserstein Distances”, Panaretos & Zemel 2019

`2019-panaretos.pdf`

: “Statistical Aspects of Wasserstein Distances”, (2019-01-01):

Wasserstein distances are metrics on probability distributions inspired by the problem of optimal mass transportation. Roughly speaking, they measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution. They are ubiquitous in mathematics, with a long history that has seen them catalyze core developments in analysis, optimization, and probability. Beyond their intrinsic mathematical richness, they possess attractive features that make them a versatile tool for the statistician: They can be used to derive weak convergence and convergence of moments, and can be easily bounded; they are well-adapted to quantify a natural notion of perturbation of a probability distribution; and they seamlessly incorporate the geometry of the domain of the distributions in question, thus being useful for contrasting complex objects. Consequently, they frequently appear in the development of statistical theory and inferential methodology, and they have recently become an object of inference in themselves. In this review, we provide a snapshot of the main concepts involved in Wasserstein distances and optimal transportation, and a succinct overview of some of their many statistical aspects.

## “Finite Mixture Models”, McLachlan et al 2019

`2019-mclachlan.pdf`

: “Finite Mixture Models”, (2019-01-01):

The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and general scientific literature. The aim of this article is to provide an up-to-date account of the theory and methodological developments underlying the applications of finite mixture models. Because of their flexibility, mixture models are being increasingly exploited as a convenient, semiparametric way in which to model unknown distributional shapes. This is in addition to their obvious applications where there is group-structure in the data or where the aim is to explore the data for such structure, as in a cluster analysis. It has now been three decades since the publication of the monograph by McLachlan & Basford (1988) with an emphasis on the potential usefulness of mixture models for inference and clustering. Since then, mixture models have attracted the interest of many researchers and have found many new and interesting fields of application. Thus, the literature on mixture models has expanded enormously, and as a consequence, the bibliography here can only provide selected coverage.

## “Raincloud plots: a multi-platform tool for robust data visualization”, Allen et al 2018

“Raincloud plots: a multi-platform tool for robust data visualization”, (2018-08-23; ):

Across scientific disciplines, there is a rapidly growing recognition of the need for more statistically robust, transparent approaches to data visualization. Complimentary to this, many scientists have realized the need for plotting tools that accurately and transparently convey key aspects of statistical effects and raw data with minimal distortion.

Previously common approaches, such as plotting conditional mean or median barplots together with error-bars have been criticized for distorting effect size, hiding underlying patterns in the raw data, and obscuring the assumptions upon which the most commonly used statistical tests are based.

Here we describe a data visualization approach which overcomes these issues, providing maximal statistical information while preserving the desired ‘inference at a glance’ nature of barplots and other similar visualization devices. These “

raincloud plots” [scatterplots + smoothed histograms/density plot + box plots] can visualize raw data, probability density, and key summary statistics such as median, mean, and relevant confidence intervals in an appealing and flexible format with minimal redundancy.In this tutorial paper we provide basic demonstrations of the strength of raincloud plots and similar approaches, outline potential modifications for their optimal use, and provide open-source code for their streamlined implementation in R, Python and Matlab. Readers can investigate the R and Python tutorials interactively in the browser using Binder by Project Jupyter.

…To remedy these shortcomings, a variety of visualization approaches have been proposed, illustrated in Figure 2, below. One simple improvement is to overlay individual observations (datapoints) beside the standard bar-plot format, typically with some degree of randomized jitter to improve visibility (Figure 2A). Complementary to this approach, others have advocated for more statistically robust illustrations such as box plots (Tukey 1970), which display sample median alongside interquartile range. Dot plots can be used to combine a histogram-like display of distribution with individual data observations (Figure 2B). In many cases, particularly when parametric statistics are used, it is desirable to plot the distribution of observations. This can reveal valuable information about how e.g., some condition may increase the skewness or overall shape of a distribution. In this case, the ‘violin plot’ (Figure 2C) which displays a probability density function of the data mirrored about the uninformative axis is often preferred (Hintze & Nelson 1998). With the advent of increasingly flexible and modular plotting tools such as ggplot2 (Wickham 2010; Wickham & Chang 2008), all of the aforementioned techniques can be combined in a complementary fashion…Indeed, this combined approach is typically desirable as each of these visualization techniques have various trade-offs.

…On the other hand, the interpretation of dot plots depends heavily on the choice of dot-bin and dot-size, and these plots can also become extremely difficult to read when there are many observations. The violin plot in which the probability density function (PDF) of observations are mirrored, combined with overlaid box plots, have recently become a popular alternative. This provides both an assessment of the data distribution and statistical inference at a glance (SIG) via overlaid box plots

^{3}. However, there is nothing to be gained, statistically speaking, by mirroring the PDF in the violin plot, and therefore they are violating the philosophy of minimizing the “data-ink ratio” (Tufte 1983)^{4}.To overcome these issues, we propose the use of the ‘raincloud plot’ (Neuroconscience 2018), illustrated in

Figure 3: The raincloud plot combines a wide range of visualization suggestions, and similar precursors have been used in various publications (e.g., Ellison 1993, Figure 2.4; Wilson et al 2018). The plot attempts to address the aforementioned limitations in an intuitive, modular, and statistically robust format. In essence, raincloud plots combine a ‘split-half violin’ (an un-mirrored PDF plotted against the redundant data axis), raw jittered data points, and a standard visualization of central tendency (i.e., mean or median) and error, such as a boxplot. As such the raincloud plot builds on code elements from multiple developers and scientific programming languages (Hintze & Nelson 1998; Patil 2018; Wickham & Chang 2008; Wilke 2017).

## “Phenomic selection: a low-cost and high-throughput alternative to genomic selection”, Rincent et al 2018

“Phenomic selection: a low-cost and high-throughput alternative to genomic selection”, (2018-04-16; backlinks):

Genomic selection—the prediction of breeding values using DNA polymorphisms—is a disruptive method that has widely been adopted by animal and plant breeders to increase crop, forest and livestock productivity and ultimately secure food and energy supplies. It improves breeding schemes in different ways, depending on the biology of the species and genotyping and phenotyping constraints. However, both genomic selection and classical phenotypic selection remain difficult to implement because of the high genotyping and phenotyping costs that typically occur when selecting large collections of individuals, particularly in early breeding generations. To specifically address these issues, we propose a new conceptual framework called phenomic selection, which consists of a prediction approach based on low-cost and high-throughput phenotypic descriptors rather than DNA polymorphisms. We applied phenomic selection on two species of economic interest (wheat and poplar) using near-infrared spectroscopy on various tissues. We showed that one could reach accurate predictions in independent environments for developmental and productivity traits and tolerance to disease. We also demonstrated that under realistic scenarios, one could expect much higher genetic gains with phenomic selection than with genomic selection. Our work constitutes a proof of concept and is the first attempt at phenomic selection; it clearly provides new perspectives for the breeding community, as this approach is theoretically applicable to any organism and does not require any genotypic information.

## “Intact Connectional Morphometricity Learning Using Multi-view Morphological Brain Networks with Application to Autism Spectrum Disorder”, Bessadok & Rekik 2018

`2018-bessadok.pdf`

: “Intact Connectional Morphometricity Learning Using Multi-view Morphological Brain Networks with Application to Autism Spectrum Disorder”, Alaa Bessadok, Islem Rekik (backlinks)

## “Environmental factors dominate over host genetics in shaping human gut microbiota composition”, Rothschild et al 2017

“Environmental factors dominate over host genetics in shaping human gut microbiota composition”, (2017-06-26; backlinks):

Human gut microbiome composition is shaped by multiple host intrinsic and extrinsic factors, but the relative contribution of host genetic compared to environmental factors remains elusive. Here, we genotyped a cohort of 696 healthy individuals from several distinct ancestral origins and a relatively common environment, and demonstrate that there is no statistically-significant association between microbiome composition and ethnicity, single nucleotide polymorphisms (SNPs), or overall genetic similarity, and that only 5 of 211 (2.4%) previously reported microbiome-SNP associations replicate in our cohort. In contrast, we find similarities in the microbiome composition of genetically unrelated individuals who share a household. We define the term

biome-explainabilityas the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics. Consistent with our finding that microbiome and host genetics are largely independent, we find significant biome-explainability levels of 16–33% for body mass index (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption. We further show that several human phenotypes can be predicted substantially more accurately when adding microbiome data to host genetics data, and that the contribution of both data sources to prediction accuracy is largely additive. Overall, our results suggest that human microbiome composition is dominated by environmental factors rather than by host genetics.

## “PinMe_PrateekCommentsIncluded_Aug_21.pdf”, arsalan 2017

`2017-mosenia.pdf`

: “PinMe_PrateekCommentsIncluded_Aug_21.pdf”, arsalan (backlinks)

## “Morphometricity as a measure of the neuroanatomical signature of a trait”, Sabuncu et al 2016

“Morphometricity as a measure of the neuroanatomical signature of a trait”, (2016-09-09; ; backlinks):

Neuroimaging has largely focused on 2 goals: mapping associations between neuroanatomical features and phenotypes and building individual-level prediction models. This paper presents a complementary analytic strategy called

morphometricitythat aims to measure the neuroanatomical signatures of different phenotypes.Inspired by prior work on [genetic] heritability, we define morphometricity as the proportion of phenotypic variation that can be explained by brain morphology (eg., as captured by structural brain MRI). In the dawning era of large-scale datasets comprising traits across a broad phenotypic spectrum, morphometricity will be critical in prioritizing and characterizing behavioral, cognitive, and clinical phenotypes based on their neuroanatomical signatures. Furthermore, the proposed framework will be important in dissecting the functional, morphological, and molecular underpinnings of different traits.

…Complex physiological and behavioral traits, including neurological and psychiatric disorders, often associate with distributed anatomical variation. This paper introduces a global metric, called morphometricity, as a measure of the anatomical signature of different traits. Morphometricity is defined as the proportion of phenotypic variation that can be explained by macroscopic brain morphology.

We estimate morphometricity via a linear mixed-effects model that uses an anatomical similarity matrix computed based on measurements derived from structural brain MRI scans. We examined over 3,800 unique MRI scans from 9 large-scale studies to estimate the morphometricity of a range of phenotypes, including clinical diagnoses such as Alzheimer’s disease, and nonclinical traits such as measures of cognition.

Our results demonstrate that morphometricity can provide novel insights about the neuroanatomical correlates of a diverse set of traits, revealing associations that might not be detectable through traditional statistical techniques.

[

Keywords: neuroimaging, brain morphology, statistical association]

## “Exploring Factor Model Parameters across Continuous Variables with Local Structural Equation Models”, Hildebrandt et al 2016

`2016-hildebrandt.pdf`

: “Exploring Factor Model Parameters across Continuous Variables with Local Structural Equation Models”, (2016-04-06):

Using an empirical data set, we investigated variation in factor model parameters across a continuous moderator variable and demonstrated three modeling approaches: multiple-group mean and covariance structure (MGMCS) analyses, local structural equation modeling (LSEM), and moderated factor analysis (MFA). We focused on how to study variation in factor model parameters as a function of continuous variables such as age, socioeconomic status, ability levels, acculturation, and so forth. Specifically, we formalized the LSEM approach in detail as compared with previous work and investigated its statistical properties with an analytical derivation and a simulation study. We also provide code for the easy implementation of LSEM. The illustration of methods was based on cross-sectional cognitive ability data from individuals ranging in age from 4 to 23 years. Variations in factor loadings across age were examined with regard to the age differentiation hypothesis. LSEM and MFA converged with respect to the conclusions. When there was a broad age range within groups and varying relations between the indicator variables and the common factor across age, MGMCS produced distorted parameter estimates. We discuss the pros of LSEM compared with MFA and recommend using the two tools as complementary approaches for investigating moderation in factor model parameters.

[

Keywords: Local structural equation model, moderated factor analysis, multiple-group mean and covariance structures, age differentiation of cognitive abilities, WJ-III tests of cognitive abilities]

## “Do scholars follow Betteridge’s Law? The use of questions in journal article titles”, Cook & Plourde 2016

`2016-cook.pdf`

: “Do scholars follow Betteridge’s Law? The use of questions in journal article titles”, James M. Cook, Dawn Plourde

## “Low-dose paroxetine exposure causes lifetime declines in male mouse body weight, reproduction and competitive ability as measured by the novel organismal performance assay”, Ruff et al 2015

`2015-gaukler.pdf`

: “Low-dose paroxetine exposure causes lifetime declines in male mouse body weight, reproduction and competitive ability as measured by the novel organismal performance assay”, (2015; ; backlinks):

Paroxetine is a selective serotonin reuptake inhibitor (SSRI) that is currently available on the market and is suspected of causing congenital malformations in babies born to mothers who take the drug during the first trimester of pregnancy.

We utilized

organismal performance assays(OPAs), a novel toxicity assessment method, to assess the safety of paroxetine during pregnancy in a rodent model. OPAs utilize genetically diverse wild mice (Mus musculus) to evaluate competitive performance between experimental and control animals as they compete amongst each other for limited resources in semi-natural enclosures. Performance measures included reproductive success, male competitive ability and survivorship.Paroxetine-exposed males weighed 13% less, had 44% fewer offspring, dominated 53% fewer territories and experienced a 2.5-fold increased trend in mortality, when compared with controls. Paroxetine-exposed females had 65% fewer offspring early in the study, but rebounded at later time points. In cages, paroxetine-exposed breeders took 2.3 times longer to produce their first litter and pups of both sexes experienced reduced weight when compared with controls. Low-dose paroxetine-induced health declines detected in this study were undetected in preclinical trials with dose 2.5-8 times higher than human therapeutic doses.

These data indicate that OPAs detect phenotypic adversity and provide unique information that could useful towards safety testing during pharmaceutical development.

[

Keywords: intraspecific competition, pharmacodynamics, reproductive success, semi-natural enclosures, SSRI, toxicity assessment.]

## “*Past, Present, and Future of Statistical Science* [COPSS 50^{th} anniversary anthology]”, Lin et al 2014

`2014-copss-pastpresentfuturestatistics.pdf`

: “*Past, Present, and Future of Statistical Science* [COPSS 50^{th} anniversary anthology]”, (2014-03-26; backlinks):

Past, Present, and Future of Statistical Sciencewas commissioned in 2013 by the Committee of Presidents of Statistical Societies (COPSS) to celebrate its 50^{th}anniversary and the International Year of Statistics. COPSS consists of five charter member statistical societies in North America and is best known for sponsoring prestigious awards in statistics, such as the COPSS Presidents’ award. Through the contributions of a distinguished group of 50 statisticians who are past winners of at least one of the five awards sponsored by COPSS, this volume showcases the breadth and vibrancy of statistics, describes current challenges and new opportunities, highlights the exciting future of statistical science, and provides guidance to future generations of statisticians. The book is not only about statistics and science but also about people and their passion for discovery. Distinguished authors present expository articles on a broad spectrum of topics in statistical education, research, and applications. Topics covered include reminiscences and personal reflections on statistical careers, perspectives on the field and profession, thoughts on the discipline and the future of statistical science, and advice for young statisticians. Many of the articles are accessible not only to professional statisticians and graduate students but also to undergraduate students interested in pursuing statistics as a career and to all those who use statistics in solving real-world problems. A consistent theme of all the articles is the passion for statistics enthusiastically shared by the authors. Their success stories inspire, give a sense of statistics as a discipline, and provide a taste of the exhilaration of discovery, success, and professional accomplishment.“This collection of reminiscences, musings on the state of the art, and advice for young statisticians makes for compelling reading. There are 52 contributions from eminent statisticians who have won a Committee of Presidents of Statistical Societies award. Each is a short, focused chapter and so one could even say this is ideal bedtime (or coffee break) reading. Anyone interested in the history of statistics will know that much has been written about the early days but little about the field since the Second World War. This book goes some way to redress this and is all the more valuable for coming from the horse’s mouth…the closing chapter, the shortest of all, from Brad Efron: a list of”thirteen rules for giving a really bad talk“. This made me laugh out loud and should be posted on the walls of all conferences. I shall leave the final word to Peter Bickel:”We should glory in this time when statistical thinking pervades almost every field of endeavor. It is really a lot of fun."

―Robert Grant, in

Significance, April 2017

The History of COPSS: “A brief history of the Committee of Presidents of Statistical Societies (COPSS)”, Ingram Olkin

Reminiscences and Personal Reflections on Career Paths“Reminiscences of the Columbia University Department of Mathematical Statistics in the late 1940s”, Ingram Olkin · “A career in statistics”, Herman Chernoff · “. . . how wonderful the field of statistics is . . .”, David R. Brillinger · “An unorthodox journey to statistics: Equity issues, remarks on multiplicity”, Juliet Popper Shaffer · “Statistics before and after my COPSS Prize”, Peter J. Bickel · “The accidental biostatistics professor”, Donna Brogan · “Developing a passion for statistics”, Bruce G. Lindsay · “Reflections on a statistical career and their implications”, R. Dennis Cook · “Science mixes it up with statistics”, Kathryn Roeder · “Lessons from a twisted career path”, Jeffrey S. Rosenthal · “Promoting equity”, Mary Gray

Perspectives on the Field and Profession“Statistics in service to the nation”, Stephen E. Fienberg · “Where are the majors?”, Iain M. Johnstone · “We live in exciting times”, Peter Hall · “The bright future of applied statistics”, Rafael A. Irizarry · “The road travelled: From a statistician to a statistical scientist”, Nilanjan Chatterjee · “Reflections on a journey into statistical genetics and genomics”, Xihong Lin · “Reflections on women in statistics in Canada”, Mary E. Thompson · “The whole women thing”, Nancy Reid · “Reflections on diversity”, Louise Ryan

Reflections on the Discipline“Why does statistics have two theories?”, Donald A. S. Fraser · “Conditioning is the issue”, James O. Berger · “Statistical inference from a Dempster-Shafer perspective”, Arthur P. Dempster · “Nonparametric Bayes”, David B. Dunson · “How do we choose our default methods?”, Andrew Gelman · “Serial correlation and Durbin-Watson bounds”, T. W. Anderson · “A non-asymptotic walk in probability and statistics”, Pascal Massart · “The past’s future is now: What will the present’s future bring?”, Lynne Billard · “Lessons in biostatistics”, Norman E. Breslow · “A vignette of discovery”, Nancy Flournoy · “Statistics and public health research”, Ross L. Prentice · “Statistics in a new era for finance and health care”, Tze Leung Lai · “Meta-analyses: Heterogeneity can be a good thing”, Nan M. Laird · “Good health: Statistical challenges in personalizing disease prevention”, Alice S. Whittemore · “Buried treasures”, Michael A. Newton · “Survey sampling: Past controversies, current orthodoxy, future paradigms”, Roderick J. A. Little · “Environmental informatics: Uncertainty quantification in the environmental sciences”, Noel A. Cressie · “A journey with statistical genetics”, Elizabeth Thompson · “Targeted learning: From MLE to TMLE”, Mark van der Laan · “Statistical model building, machine learning, and the ah-ha moment”, Grace Wahba · “In praise of sparsity and convexity”, Robert J. Tibshirani · “Features of Big Data and sparsest solution in high confidence set”, Jianqing Fan · “Rise of the machines”, Larry A. Wasserman · “A trio of inference problems that could win you a Nobel Prize in statistics (if you help fund it)”, Xiao-Li Meng

Advice for the Next Generation“Inspiration, aspiration, ambition”, C. F. Jeff Wu · “Personal reflections on the COPSS Presidents’ Award”, Raymond J. Carroll · “Publishing without perishing and other career advice”, Marie Davidian · “Converting rejections into positive stimuli”, Donald B. Rubin · “The importance of mentors”, Donald B. Rubin · “Never ask for or give advice, make mistakes, accept mediocrity, enthuse”, Terry Speed · “Thirteen rules”, Bradley Efron

## “Psychological Measurement and Methodological Realism”, Hood 2013

`2013-hood.pdf`

: “Psychological Measurement and Methodological Realism”, S. Brian Hood

## “Confidence Intervals for the Weighted Sum of Two Independent Binomial Proportions”, Decrouez & Robinson 2012

`2012-decrouez.pdf`

: “Confidence Intervals for the Weighted Sum of Two Independent Binomial Proportions”, (2012-09-01):

Confidence intervals for the difference of two binomial proportions are well known, however, confidence intervals for the weighted sum of two binomial proportions are less studied. We develop and compare 7 methods for constructing confidence intervals for the weighted sum of 2 independent binomial proportions. The interval estimates are constructed by inverting the Wald test, the score test and the Likelihood ratio test. The weights can be negative, so our results generalize those for the difference between two independent proportions. We provide a numerical study that shows that these confidence intervals based on large-sample approximations perform very well, even when a relatively small amount of data is available. The intervals based on the inversion of the score test showed the best performance. Finally, we show that as for the difference of two binomial proportions, adding four pseudo-outcomes to the Wald interval for the weighted sum of two binomial proportions improves its coverage substantially, and we provide a justification for this correction.

[

Keywords: border security, leakage survey, likelihood ratio test, quarantine inspection, score test, small sample, sum of proportions, Wald test]

## “The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost”, Herculano-Houzel 2012

`2012-herculanohouzel.pdf`

: “The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost”, (2012-06-19; ; backlinks):

[Herculano-Houzel 2009] Neuroscientists have become used to a number of “facts” about the human brain: It has 100 billion neurons and 10- to 50-fold more glial cells; it is the largest-than-expected for its body among primates and mammals in general, and therefore the most cognitively able; it consumes an outstanding 20% of the total body energy budget despite representing only 2% of body mass because of an increased metabolic need of its neurons; and it is endowed with an overdeveloped cerebral cortex, the largest compared with brain size.

These facts led to the widespread notion that the human brain is literally extraordinary: an outlier among mammalian brains, defying evolutionary rules that apply to other species, with a uniqueness seemingly necessary to justify the superior cognitive abilities of humans over mammals with even larger brains. These facts, with deep implications for neurophysiology and evolutionary biology, are not grounded on solid evidence or sound assumptions, however.

Our recent development of a method that allows rapid and reliable quantification of the numbers of cells that compose the whole brain has provided a means to verify these facts. Here, I review this recent evidence and argue that, with 86 billion neurons and just as many nonneuronal cells, the human brain is a scaled-up primate brain in its cellular composition and metabolic cost, with a relatively enlarged cerebral cortex that does not have a relatively larger number of brain neurons yet is remarkable in its cognitive abilities and metabolism simply because of its extremely large number of neurons.

## “The Changing History of Robustness”, Stigler 2010

`2010-stigler.pdf`

: “The Changing History of Robustness”, Stephen M. Stigler

## “A new car-following model yielding log-normal type headways distributions”, 力) et al 2010 {#力)-et-al-2010-section}

`2010-li.pdf`

: “A new car-following model yielding log-normal type headways distributions”, Li Li (李 力), Wang Fa (王 法), Jiang Rui (姜 锐), Hu Jian-Ming (胡坚明), Ji Yan (吉 岩) (backlinks)

## “Continuous Parameter Estimation Model: Expanding the Standard Statistical Paradigm”, Gorsuch 2005

`2005-gorsuch.pdf`

: “Continuous Parameter Estimation Model: Expanding the Standard Statistical Paradigm”, Richard L. Gorsuch

## “William Sealy Gosset”, Fienberg & Lazar 2001

`2001-fienberg.pdf`

: “William Sealy Gosset”, Stephen E. Fienberg, Nicole Lazar

## “There Is a Time and a Place for Significance Testing”, Mulaik et al 1997

`1997-muzaik.pdf`

: “There Is a Time and a Place for Significance Testing”, Stanley A. Mulaik, Nambury S. Raju, Richard A. Harshman (backlinks)

## “THE ENTROPY OF ENGLISH USING PPM-BASED MODELS - Data Compression Conference, 1996. DCC '96. Proceedings”

`1996-teahan.pdf`

: “THE ENTROPY OF ENGLISH USING PPM-BASED MODELS - Data Compression Conference, 1996. DCC '96. Proceedings” (backlinks)

## “Error rates in quadratic discrimination with constraints on the covariance matrices”, Flury et al 1994

`1994-flury.pdf`

: “Error rates in quadratic discrimination with constraints on the covariance matrices”, (1994-03-01):

In multivariate discrimination of several normal populations, the optimal classification procedure is based on quadratic discriminant functions.

We compare expected error rates of the quadratic classification procedure if the covariance matrices are estimated under the following 4 models: (1) arbitrary covariance matrices, (2) common principal components, (3) proportional covariance matrices, and (4) identical covariance matrices.

Using Monte Carlo simulation to estimate expected error rates, we study the performance of the 4 discrimination procedures for 5 different parameter setups corresponding to “standard” situations that have been used in the literature. The procedures are examined for sample sizes ranging from 10 to 60, and for 2 to 4 groups.

Our results quantify the extent to which a parsimonious method reduces error rates, and demonstrate that choosing a simple method of discrimination is often beneficial even if the underlying model assumptions are wrong.

[

Keywords: Common principal components, Linear Discriminant Function, Monte Carlo simulation, Proportional Covariance Matrices]

## “Statistics as Rhetoric in Psychology”, John 1992

`1992-john.pdf`

: “Statistics as Rhetoric in Psychology”, I. D. John

## “Another comment on O'Cinneide”, Mallows 1991

`1990-mallows.pdf`

: “Another comment on O'Cinneide”, Colin Mallows

## “The Mean is within One Standard Deviation of Any Median”, O'cinneide 1990 {#o’cinneide-1990-section}

`1990-ocinneide.pdf`

: “The Mean is within One Standard Deviation of Any Median”, Colm Art O'cinneide

## “1_1.tif”

`1983-kolmogorov.pdf`

: “1_1.tif”

## “Essence of Statistics (Second Edition)”, Loftus & Loftus 1982

`1982-loftus-essenceofstatistics.pdf`

: “Essence of Statistics (Second Edition)”, Geoffry R. Loftus, Elizabeth F. Loftus (backlinks)

## “Statistics in Britain 1865-1930: The Social Construction of Scientific Knowledge”, MacKenzie & A. 1981

`1981-mackenzie-statisticsinbritain18651930.pdf`

: “Statistics in Britain 1865-1930: The Social Construction of Scientific Knowledge”, MacKenzie, Donald A.

## “On Rereading R. A. Fisher [Fisher Memorial lecture, with comments]”, Savage et al 1976

`1976-savage.pdf`

: “On Rereading R. A. Fisher [Fisher Memorial lecture, with comments]”, Leonard J. Savage, John Pratt, Bradley Efron, Churchill Eisenhart, Bruno de Finetti, D. A. S. Fraser, V. P. Godambe, I. J. Good, O. Kempthorne, Stephen M. Stigler, I. Richard Savage

## “Theory Confirmation in Psychology”, Swoyer & Monson 1975

`1975-swoyer.pdf`

: “Theory Confirmation in Psychology”, Chris Swoyer, Thomas C. Monson (backlinks)

## “On the alleged falsity of the null hypothesis”, Oakes 1975

`1975-oakes.pdf`

: “On the alleged falsity of the null hypothesis”, William F. Oakes (backlinks)

## “On Prior Probabilities of Rejecting Statistical Hypotheses”, Keuth 1973

`1973-keuth.pdf`

: “On Prior Probabilities of Rejecting Statistical Hypotheses”, Herbert Keuth (backlinks)

## “Control of spurious association and the reliability of the controlled variable”, Kahneman 1965

`1965-kahneman.pdf`

: “Control of spurious association and the reliability of the controlled variable”, Daniel Kahneman (backlinks)

## “Social Statistics”, Blalock & Jr. 1960

`1960-blalock-socialstatistics.pdf`

: “Social Statistics”, Hubert M. Blalock, Jr.

## “'Student' and Small Sample Theory”, Welch 1958

`1958-welch.pdf`

: “'Student' and Small Sample Theory”, B. L. Welch

## “The Influence of 'Statistical Methods for Research Workers' on the Development of the Science of Statistics”, Yates 1951

`1951-yates.pdf`

: “The Influence of 'Statistical Methods for Research Workers' on the Development of the Science of Statistics”, Francis Yates (backlinks)

## “85_1.tif”, Thorndike 1942

`1942-thorndike.pdf`

: “85_1.tif”, Robert L. Thorndike (backlinks)

## “On the Non-Existence of Tests of "Student's" Hypothesis Having Power Functions Independent of σ”, Dantzig 1940

`1940-dantzig.pdf`

: “On the Non-Existence of Tests of "Student's" Hypothesis Having Power Functions Independent of σ”, George B. Dantzig

## “Professor Ronald Aylmer Fisher [profile]”, Mahalanobis 1938

`1938-mahalanobis.pdf`

: “Professor Ronald Aylmer Fisher [profile]”, P. C. Mahalanobis

## “Evaluating the Effect of Inadequately Measured Variables in Partial Correlation Analysis”, Stouffer 1936

`1936-stouffer.pdf`

: “Evaluating the Effect of Inadequately Measured Variables in Partial Correlation Analysis”, (1936; ; backlinks):

It is not generally recognized that such an analysis [using regression] assumes that each of the variables is perfectly measured, such that a second measure

X’, of the variable measured by_{i}X, has a correlation of unity with_{i}X. If some of the measures are more accurate than others, the analysis is impaired [by measurement error]. For example, the sociologist may have a problem in which an index of economic status and an index of nativity are independent variables. What is the effect, if the index of economic status is much less satisfactory than the index of nativity? Ordinarily, the effect will be to underestimate the [coefficient] of the less adequately measured variable and to overestimate the [coefficient] of the more adequately measured variable._{i}If either the reliability or validity of an index is in question, at least two measures of the variable are required to permit an evaluation. The purpose of this paper is to provide a logical basis and a simple arithmetical procedure (

a) for measuring the effect of the use of 2 indexes, each of one or more variables, in partial and multiple correlation analysis and (b) for estimating the likely effect if 2 indexes, not available, could be secured.

## “The Method of Path Coefficients”, Wright 1934

`1934-wright.pdf`

: “The Method of Path Coefficients”, Sewall Wright

## “The Limits of a Measure of Skewness”, Hotelling & Solomons 1932

`1932-hotelling.pdf`

: “The Limits of a Measure of Skewness”, (1932-05-01):

[Peyman: The Mean (μ) and median (m) of any distribution (with finite variance) are at most one standard deviation (σ) apart:

|μ − m| ≤ σ

It was written up around the same time by C. Mallows in "Another comment on O’Cinneide",

The American Statistician, 45-3, using Jensen’s inequality twice. If the distribution is unimodal|μ − m| ≤ √3⁄5 σ ]

## “Correlation Calculated from Faulty Data”, Spearman 1910

`1910-spearman.pdf`

: “Correlation Calculated from Faulty Data”, Charles Spearman

## “Some Experimental Results in the Correlation of Mental Abilities”, Brown 1910

`1910-brown.pdf`

: “Some Experimental Results in the Correlation of Mental Abilities”, William Brown

# Miscellaneous

`https://www.sciencedirect.com/science/article/pii/S0896627317310929`

(backlinks)`https://www.nature.com/articles/s41467-019-10317-7`

(backlinks)`https://www.lesswrong.com/posts/6miu9BsKdoAi72nkL/a-contamination-theory-of-the-obesity-epidemic?commentId=9gMgjDQ5xgbxe8aLd`

(backlinks)`https://surveyanon.wordpress.com/2019/07/22/playing-around-with-gendermetricity/`

(backlinks)`http://matthewtoews.com/papers/IPMI2019_BoF_Manifold_Laurent.pdf`

(backlinks)`2001-francesrichard-obsessivegeneroustowardadiagramofmarklombardi.html`

`1973-meshalkin-collectionofproblemsinprobabilitytheory.pdf`

(backlinks)`1957-feller-anintroductiontoprobabilitytheoryanditsapplications.pdf`

(backlinks)