Skip to main content

statistics directory

Links

“Fake Journal Club: Teaching Critical Reading”, Branwen 2022

Fake-Journal-Club: “Fake Journal Club: Teaching Critical Reading”⁠, Gwern Branwen (2022-03-07; ⁠, ⁠, ; backlinks; similar):

Discussion of how to teach active reading and questioning of scientific research. Partially fake research papers may teach a critical attitude. Various ideas for games reviewed.

How do researchers transition from uncritically absorbing research papers or arguments to actively grappling with it and questioning it? Most learn this meta-cognitive skill informally or by ad hoc mechanisms like being tutored by a mentor, or watching others critique papers at a ‘journal club’. This patchwork may not always work or be the best approach, as it is slow and largely implicit, and similar to calibration training in statistical forecasting, targeted training may be able to teach it rapidly.

To teach this active reading attitude of not believing everything you read, I borrow the pedagogical strategy of deliberately inserting errors which the student must detect, proposing fake research articles which could be read in a ‘fake journal club’.

Faking entire articles is a lot of work and so I look at variations on it. I suggest that NN language models like GPT-3 have gotten good enough to, for short passages, provide a challenge for human readers, and that one could create a fake journal club by having a language model repeatedly complete short passages of research articles (possibly entirely fictional ones).

This would provide difficult criticism problems with rapid feedback, scalability to arbitrarily many users, and great flexibility in content.

“Fooled by Beautiful Data: Visualization Aesthetics Bias Trust in Science, News, and Social Media”, Lin & Thornton 2022

“Fooled by beautiful data: Visualization aesthetics bias trust in science, news, and social media”⁠, Chujun Lin, Mark Allen Thornton (2022-01-04; ⁠, ⁠, ; backlinks; similar):

4 preregistered studies show that beauty increases trust in graphs from scientific papers, news, and social media.

Scientists, policymakers, and the public increasingly rely on data visualizations—such as COVID tracking charts, weather forecast maps, and political polling graphs—to inform important decisions. The aesthetic decisions of graph-makers may produce graphs of varying visual appeal, independent of data quality.

Here we tested whether the beauty of a graph influences how much people trust it. Across 3 studies, we sampled graphs from social media, news reports, and scientific publications, and consistently found that graph beauty predicted trust. In a 4th study, we manipulated both the graph beauty and misleadingness.

We found that beauty, but not actual misleadingness, causally affected trust.

These findings reveal a source of bias in the interpretation of quantitative data and indicate the importance of promoting data literacy in education. [Particularly worrisome given how effective statistics design is ignored by designers optimizing only for beauty⁠.]

[Keywords: aesthetics, beauty-is-good stereotype/​halo effect⁠, causal effects, data visualizations, publication bias, public trust]

…Here we test the hypothesis that the beauty of data visualizations influences how much people trust them. We first examined the correlation between perceived beauty and trust in graphs. To maximize the generalizability and external validity of our findings, we systematically sampled graphs (Figure 1) of diverse types and topics (Figure 2) from the real world. These graphs spanned a wide range of domains, including social media (Study 1), news reports (Study 2), and scientific publications (Study 3). We asked participants how beautiful they thought the graphs looked and how much they trusted the graphs. We also measured how much participants found the graphs interesting, understandable, surprising, and negative, to control for potential confounds (Figure 3A). In addition to predicting trust ratings, we also examined whether participants’ beauty ratings predicted real-world impact. We measured impact using indices including the number of comments the graphs received on social media, and the number of citations the graphs’ associated papers had. Finally, we tested the causal effect of graph beauty on trust by generating graphs using arbitrary data (Study 4). We orthogonally manipulated both the beauty and the actual misleadingness of these graphs and measured how these manipulations affected trust.

Results: Beauty correlates with trust across domains. We found that participants’ trust in graphs was associated with how beautiful participants thought the graphs looked for graphs across all 3 domains (Figure 3B): social media posts on Reddit (Pearson’s r = 0.45, p = 4.15×10−127 in Study 1a; r = 0.41, p = 3.28×10−231 in Study 1b), news reports (r = 0.43, p = 1.14×10−278 in Study 2), and scientific papers (r = 0.41, p = 6.×10−234 in Study 3). These findings indicate that, across diverse contents and sources of the graphs, perceived beauty and trust in graphs are reliably correlated in the minds of perceivers. The association between beauty and trust remained robust when controlling for factors that might influence both perceived beauty and trust, including how much participants thought the graphs were interesting, understandable, surprising, and negative (linear mixed modeling: b = 0.19, standardized 𝛽 = 0.22, p = 1.05×10−30 in Study 1a; b = 0.14, 𝛽 = 0.16, p = 8.81×10−46 in Study 1b; b = 0.14, 𝛽 = 0.15, p = 5.35×10−35 in Study 2; b = 0.10, 𝛽 = 0.12, p = 1.85×10−25 in Study 3; see Figure 1: for the coefficients of covariates). These findings indicate that beautiful visualizations predict increased trust even when controlling for the effects of interesting topics, understandable presentation, confirmation bias⁠, and negativity bias⁠.

Figure 3: Correlations between beauty and trust in Studies 1–3. (A) Participants viewed each graph (top; an example from Study 3) and rated each graph on 6 aspects (bottom; the order was randomized). (B) The frequency of ratings (colored; presented with 2D kernel density) on the beauty and trust of the graphs in Studies 1a, 1b, 2, and 3 (from top to bottom), and univariate correlations between the 2 variables (line for linear regression, text for Pearson’s correlation, asterisks indicate statistical-significance: ✱✱✱ for p < 0.001; n = 2,681 in Study 1a; n = 5,780 in Study 1b; n = 6,204 in Study 2; n = 6,030 in Study 3).

Beauty predicts real-world popularity: We found that the real-world popularity of the graphs was associated with how beautiful participants thought they were. The more beautiful graphs from Reddit were associated with higher numbers of comments in both Study 1a (b = 0.04, 𝛽 = 0.04, p = 0.011) and Study 1b (b = 0.11, 𝛽 = 0.12, p = 2.84×10−22). The more beautiful graphs from scientific journals were associated with papers that had higher numbers of citations in Study 3 (b = 0.07, 𝛽 = 0.05, p = 0.001; but not higher numbers of views, b = 0.03, 𝛽 = 0.02, p = 0.264). The association between the perceived beauty of a paper’s graphs and the paper’s number of citations remained robust when controlling for the paper’s publication date and how much participants thought the graphs were interesting, understandable, surprising, and negative (b = 0.05, 𝛽 = 0.04, p = 0.005). These findings suggest that people’s bias in favor of trusting beautiful graphs has real-world consequences.

Figure 4: Causal effects of beauty on trust in Study 4. (A) Manipulations of an example graph of a specific type and topic in 4 experimental conditions. (B) Manipulation check of beauty. linear mixed model regression of beauty ratings (7-point Likert scale) on beauty manipulations (binary), while controlling for the manipulations of misleadingness and the random effects of participants, graph types, and graph topics (n = 2,574 observations). (C) Causal effects of beauty and misleadingness. Linear mixed model regression of trust ratings (7-point Likert scale) on beauty and misleadingness manipulations (binary), while controlling for the random effects of participants, graph types, and graph topics (n = 2,574 observations).

Discussion: …A second, non-mutually exclusive, explanation suggests that this apparent bias may be rooted in rational thinking. More beautiful graphs may indicate that the data is of higher quality and that the graph maker is more skillful [Steele & Iliinsky 2010, Beautiful Visualization: Looking at Data through the Eyes of Experts]. However, our results suggest that this reasoning may not be accurate. It does not require sophisticated techniques to make beautiful graphs: we reliably made graphs look more beautiful simply by increasing their resolution and color saturation, and using a legible, professional font (Figure 4A–B). Findings from the real-world graphs (Studies 1–3) also suggest that one could make a very basic graph such as a bar plot look very beautiful (Figure S2F). Visual inspection of the more and less beautiful real-world graphs suggests that people perceive graphs with more colors (eg. rainbow colors), shapes (eg. cartoons, abstract shapes), and meaningful text (eg. a title explaining the meaning of the graph) as more beautiful. It also does not require high quality data to make a beautiful graph either: we generated graphs that were perceived as beautiful using arbitrary data (Figure 4B). Therefore, our findings highlight that the beauty of a graph may not be an informative cue for its quality. Even if beauty was correlated with actual data quality in the real-world, this would be a dangerous and fallible heuristic to rely upon for evaluating research and media.

“Identifying Imaging Genetic Associations via Regional Morphometricity Estimation”, Bao et al 2022

“Identifying imaging genetic associations via regional morphometricity estimation”⁠, Jingxuan Bao, Zixuan Wen, Mansu Kim, Andrew J. Saykin, Paul M. Thompson, Yize Zhao, Li Shen (2022-01-03; ⁠, ; backlinks; similar):

Brain imaging genetics is an emerging research field aiming to reveal the genetic basis of brain traits captured by imaging data. Inspired by heritability analysis, the concept of morphometricity was recently introduced to assess trait association with whole brain morphology.

In this study, we extend the concept of morphometricity from its original definition at the whole brain level to a more focal level based on a region of interest (ROI). We propose a novel framework to identify the SNP-ROI association via regional morphometricity estimation of each studied single nucleotide polymorphism (SNP).

We perform an empirical study on the structural MRI and genotyping data from a landmark Alzheimer’s disease (AD) biobank⁠; and yield promising results.

Our findings indicate that the AD-related SNPs have higher overall regional morphometricity estimates than the SNPs not yet related to AD. This observation suggests that the variance of AD SNPs can be explained more by regional morphometric features than non-AD SNPs, supporting the value of imaging traits as targets in studying AD genetics. Also, we identified 11 ROIs, where the AD/​non-AD SNPs and statistically-significant/​insignificant morphometricity estimation of the corresponding SNPs in these ROIs show strong dependency. Supplementary motor area (SMA) and dorsolateral prefrontal cortex (DPC) are enriched by these ROIs.

Our results also demonstrate that using all the detailed voxel-level measures within the ROI to incorporate morphometric information outperforms using only a single average ROI measure, and thus provides improved power to detect imaging genetic associations.

[Keywords: brain imaging genetics, regional morphometricity, Alzheimer’s disease]

“Autism-related Dietary Preferences Mediate Autism-gut Microbiome Associations”, Yap et al 2021

2021-yap.pdf: “Autism-related dietary preferences mediate autism-gut microbiome associations”⁠, Chloe X. Yap, Anjali K. Henders, Gail A. Alvares, David L. A. Wood, Lutz Krause, Gene W. Tyson, Restuadi Restuadi et al (2021-11-11; ⁠, ; backlinks; similar):

  • Limited autism-microbiome associations from stool metagenomics of n = 247 children
  • Romboutsia timonensis was the only taxa associated with autism diagnosis
  • Autistic traits such as restricted interests are associated with less-diverse diet
  • Less-diverse diet, in turn, is associated with lower microbiome alpha-diversity

There is increasing interest in the potential contribution of the gut microbiome to autism spectrum disorder (ASD). However, previous studies have been underpowered and have not been designed to address potential confounding factors in a comprehensive way.

We performed a large autism stool metagenomics study (n = 247) based on participants from the Australian Autism Biobank and the Queensland Twin Adolescent Brain project.

We found negligible direct associations between ASD diagnosis and the gut microbiome. Instead, our data support a model whereby ASD-related restricted interests are associated with less-diverse diet, and in turn reduced microbial taxonomic diversity and looser stool consistency⁠. In contrast to ASD diagnosis, our dataset was well powered to detect microbiome associations with traits such as age, dietary intake, and stool consistency.

Overall, microbiome differences in ASD may reflect dietary preferences that relate to diagnostic features, and we caution against claims that the microbiome has a driving role in ASD.

[Keywords: autism spectrum disorder, autism, gut microbiome, restricted and repetitive behaviors and interests, diet, metagenomics, stool consistency, brain-gut-microbiome axis]

“General Dimensions of Human Brain Morphometry Inferred from Genome-wide Association Data”, Fürtjes et al 2021

“General dimensions of human brain morphometry inferred from genome-wide association data”⁠, Anna Elisabeth Fürtjes, Ryan Arathimos, Jonathan R. I. Coleman, James H. Cole, Simon R. Cox, Ian J. Deary et al (2021-10-25; ⁠, ⁠, ; backlinks; similar):

The human brain is organised into networks of interconnected regions that have highly correlated volumes. In this study, we aim to triangulate insights into brain organisation and its relationship with cognitive ability and ageing, by analysing genetic data.

We estimated general genetic dimensions of human brain morphometry within the whole brain, and 9 predefined canonical brain networks of interest. We did so based on principal components analysis (PCA) of genetic correlations among grey-matter volumes for 83 cortical and subcortical regions (nparticipants = 36,778).

We found that the corresponding general dimension of brain morphometry accounts for 40% of the genetic variance in the individual brain regions across the whole brain, and 47–65% within each network of interest. This genetic correlation structure of regional brain morphometry closely resembled the phenotypic correlation structure of the same regions. Applying a novel multivariate methodology for calculating SNP effects for each of the general dimensions identified, we find that general genetic dimensions of morphometry within networks are negatively associated with brain age (rg = −0.34) and profiles characteristic of age-related neurodegeneration, as indexed by cross-sectional age-volume correlations (r = −0.27). The same genetic dimensions were positively associated with a genetic general factor of cognitive ability (rg = 0.17–0.21 for different networks).

We have provided a statistical framework to index general dimensions of shared genetic morphometry that vary between brain networks, and report evidence for a shared biological basis underlying brain morphometry, cognitive ability, and brain ageing, that are underpinned by general genetic factors.

…This indicates that the genetic association between brain morphometry and cognitive ability was not driven by specific network configurations. Instead, dimensions of shared genetic morphometry in general indexed genetic variance relevant to larger brain volumes and a brain organisation that is advantageous for better cognitive performance. This was regardless of how many brain regions and from which regions the measure of shared genetic morphometry was extracted. This lack of differentiation between networks, in how strongly they correlate with cognitive ability, is in line with the suggestion that the total number of neurons in the mammalian cortex, which should at least partly correspond to its volume, is a major predictor of higher cognitive ability.37 These findings suggest that highly shared brain morphometry between regions, and its genetic analogue, indicate a generally bigger, and cognitively better-functioning brain.

“MegaLMM: Mega-scale Linear Mixed Models for Genomic Predictions With Thousands of Traits”, Runcie et al 2021

“MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits”⁠, Daniel E. Runcie, Jiayi Qu, Hao Cheng, Lorin Crawford (2021-07-23; backlinks; similar):

[Previously: Grid-LMM (Runcie & Crawford 2019).] Large-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model⁠, a tool notorious for its fragility when applied to more than a handful of traits. We present MegaLMM, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using 3 examples with real plant data, we show that MegaLMM can leverage thousands of traits at once to substantially improve genetic value prediction accuracy.

…Here, we describe MegaLMM (linear mixed models for millions of observations), a novel statistical method and computational algorithm for fitting massive-scale MvLMMs to large-scale phenotypic datasets. Although we focus on plant breeding applications for concreteness, our method can be broadly applied wherever multi-trait linear mixed models are used (eg. human genetics, industrial experiments, psychology, linguistics, etc.). MegaLMM dramatically improves upon existing methods that fit low-rank MvLMMs, allowing multiple random effects and un-balanced study designs with large amounts of missing data. We achieve both scalability and statistical robustness by combining strong, but biologically motivated, Bayesian priors for statistical regularization—analogous to the pn approach of genomic prediction methods—with algorithmic innovations recently developed for LMMs. In the 3 examples below, we demonstrate that our algorithm maintains high predictive accuracy for tens-of-thousands of traits, and dramatically improves the prediction of genetic values over existing methods when applied to data from real breeding programs.

…Together, the set of parallel univariate LMMs and the set of factor loading vectors result in a novel and very general re-parameterization of the MvLMM framework as a mixed-effect factor model. This parameterization leads to dramatic computational performance gains by avoiding all large matrix inversions. It also serves as a scaffold for eliciting Bayesian priors that are intuitive and provide powerful regularization which is necessary for robust performance with limited data. Our default prior distributions encourage: (1) shrinkage on the factor-trait correlations (λjk) to avoid over-fitting covariances, and (2) shrinkage on the factor sizes to avoid including too many latent traits. This 2-dimensional regularization helps the model focus only on the strongest, most relevant signals in the data.

Model limitations: While MegaLMM works well across a wide range of applications in breeding programs, our approach does have some limitations.

First, since MegaLMM is built on the Grid-LMM framework for efficient likelihood calculations22, it does not scale well to large numbers of observations (in contrast to large numbers of traits), or large numbers of random effects. As the number of observational units increases, MegaLMM’s memory requirements increase quadratically because of the requirement to store sets of pre-calculated inverse-variance matrices. Similarly, for each additional random effect term included in the model, memory requirements increase exponentially. Therefore, we generally limit models to fewer than 10,000 observations [n] and only 1-to-4 random effect terms per trait. There may be opportunities to reduce this memory burden if some of the random effects are low-rank; then these random effects could be updated on the fly using efficient routines for low-rank Cholesky updates. We also do not currently suggest including regressions directly on markers and have used marker-based kinship matrices here instead for computational efficiency. Therefore as a stand-alone prediction method, MegaLMM requires calculations involving the Schur complement of the joint kinship matrix of the testing and training individuals which can be computationally costly.

Second, MegaLMM is inherently a linear model and cannot effectively model trait relationships that are non-linear. Some non-linear relationships between predictor variables (like genotypes) and traits can be modeled through non-linear kernel matrices, as we demonstrated with the RKHS application to the Bread Wheat data. However, allowing non-linear relationships among traits is currently beyond the capacity of our software and modeling approach. Extending our mixed effect model on the low-dimensional latent factor space to a non-linear modeling structure like a neural network may be an exciting area for future research. Also, some sets of traits may not have low-rank correlation structures that are well-approximated by a factor model. For example, certain auto-regressive dependence structures are low-rank but cannot efficiently be decomposed into a discrete set of factors.

Nevertheless, we believe that in its current form, MegaLMM will be useful to a wide range of researchers in quantitative genetics and plant breeding.

“Rare Greek Variables”, Branwen 2021

Variables: “Rare Greek Variables”⁠, Gwern Branwen (2021-04-08; ⁠, ⁠, ; backlinks; similar):

I scrape Arxiv to find underused Greek variables which can add some diversity to math; the top 10 underused letters are ϰ, ς, υ, ϖ, Υ, Ξ, ι, ϱ, ϑ, & Π. Avoid overused letters like λ, and spice up your next paper with some memorable variables!

Some Greek alphabet variables are just plain overused. It seems like no paper is complete without a bunch of E or μ or α variables splattered across it—and they all mean different things in different papers, and that’s when they don’t mean different things in the same paper! In the spirit of offering constructive criticism, might I suggest that, based on Arxiv frequency of usage, you experiment with more recherché, even, outré variables?

Instead of reaching for that exhausted π, why not use… ϰ (variant kappa)? (It looks like a Hebrew escapee…) Or how about ς (variant sigma), which is calculated to get your reader’s attention by making them go “ςςς” and exclaim “these letters are Greek to me!”

The top 10 least-used Greek variables on Arxiv⁠, rarest to more common:

  1. \varkappa (ϰ)
  2. \varsigma (ς)
  3. \upsilon (υ)
  4. \varpi (ϖ)
  5. \Upsilon (Υ)
  6. \varrho (ϱ)
  7. \Xi (Ξ)
  8. \vartheta (ϑ)
  9. \iota (ι)
  10. \Pi (Π)

“A Parsimonious Model for Mass-univariate Vertex-wise Analysis”, Couvy-Duchesne et al 2021

“A parsimonious model for mass-univariate vertex-wise analysis”⁠, Baptiste Couvy-Duchesne, Futao Zhang, Kathryn E. Kemper, Julia Sidorenko, Naomi R. Wray, Peter M. Visscher et al (2021-01-22; backlinks; similar):

Covariance between grey-matter measurements can reflect structural or functional brain networks though it has also been shown to be influenced by confounding factors (eg. age, head size, scanner), which could lead to lower mapping precision (increased size of associated clusters) and create distal false positives associations in mass-univariate vertex-wise analyses.

We evaluated this concern by performing state-of-the-art mass-univariate analyses (generalized linear model⁠, GLM) on traits simulated from real vertex-wise grey matter data (including cortical and subcortical thickness and surface area). We contrasted the results with those from linear mixed models (LMMs), which have been shown to overcome similar issues in omics association studies.

We showed that when performed on a large sample (n = 8,662, UK Biobank), GLMs yielded large spatial clusters of statistically-significant vertices and greatly inflated false positive rate (Family Wise Error Rate: FWER = 1, cluster false discovery rate: FDR>0.6). We showed that LMMs resulted in more parsimonious results: smaller clusters and reduced false positive rate (yet FWER>5% after Bonferroni correction) but at a cost of increased computation. In practice, the parsimony of LMMs results from controlling for the joint effect of all vertices, which prevents local and distal redundant associations from reaching statistical-significance.

Next, we performed mass-univariate association analyses on 5 real UKB traits (age, sex, BMI⁠, fluid intelligence and smoking status) and LMM yielded fewer and more localized associations. We identified 19 statistically-significant clusters displaying small associations with age, sex and BMI, which suggest a complex architecture of at least dozens of associated areas with those phenotypes.

“Ensemble Learning of Convolutional Neural Network, Support Vector Machine, and Best Linear Unbiased Predictor for Brain Age Prediction: ARAMIS Contribution to the Predictive Analytics Competition 2019 Challenge”, Couvy-Duchesne et al 2020

“Ensemble Learning of Convolutional Neural Network, Support Vector Machine, and Best Linear Unbiased Predictor for Brain Age Prediction: ARAMIS Contribution to the Predictive Analytics Competition 2019 Challenge”⁠, Baptiste Couvy-Duchesne, Johann Faouzi, Benoît Martin, Elina Thibeau-Sutre, Adam Wild, Manon Ansart, Stanley Durrleman et al (2020-12-15; backlinks; similar):

We ranked third in the Predictive Analytics Competition (PAC) 2019 challenge by achieving a mean absolute error (MAE) of 3.33 years in predicting age from T1-weighted MRI brain images. Our approach combined seven algorithms that allow generating predictions when the number of features exceeds the number of observations, in particular, two versions of best linear unbiased predictor (BLUP), support vector machine (SVM), two shallow convolutional neural networks (CNNs), and the famous ResNet and Inception V1. Ensemble learning was derived from estimating weights via linear regression in a hold-out subset of the training sample. We further evaluated and identified factors that could influence prediction accuracy: choice of algorithm, ensemble learning, and features used as input/​MRI image processing. Our prediction error was correlated with age, and absolute error was greater for older participants, suggesting to increase the training sample for this subgroup. Our results may be used to guide researchers to build age predictors on healthy individuals, which can be used in research and in the clinics as non-specific predictors of disease status.

[Keywords: brain age, MRI, machine learning, deep learning, statistical learning, ensemble learning]

Morphometricity of Age as Upper Bound of Prediction Accuracy: From BLUP models, we estimated the total association between age and the brain features. Morphometricity is expressed in proportion of the variance (R2) of age; thus, it quantifies how much of the differences in age in the sample may be attributed/​associated with variation in brain structure. With surface-based processing (~650,000 vertices), we estimated the morphometricity to be R2 = 0.99 (SE = 0.052), while for volume-based processing (~480,000 voxels), it reached R2 = 0.97 (SE = 0.015).

“A Unified Framework for Association and Prediction from Vertex-wise Grey-matter Structure”, Couvy-Duchesne et al 2020

“A unified framework for association and prediction from vertex-wise grey-matter structure”⁠, Baptiste Couvy-Duchesne, Lachlan T. Strike, Futao Zhang, Yan Holtz, Zhili Zheng, Kathryn E. Kemper, Loic Yengo et al (2020-07-20; backlinks; similar):

The recent availability of large-scale neuroimaging cohorts facilitates deeper characterisation of the relationship between phenotypic and brain architecture variation in humans. Here, we investigate the association (previously coined morphometricity) of a phenotype with all 652,283 vertex-wise measures of cortical and subcortical morphology in a large data set from the UK Biobank (UKB; n = 9,497 for discovery, n = 4,323 for replication) and the Human Connectome Project (n = 1,110).

We used a linear mixed model with the brain measures of individuals fitted as random effects with covariance relationships estimated from the imaging data. We tested 167 behavioural, cognitive, psychiatric or lifestyle phenotypes and found statistically-significant morphometricity for 58 phenotypes (spanning substance use, blood assay results, education or income level, diet, depression, and cognition domains), 23 of which replicated in the UKB replication set or the HCP. We then extended the model for a bivariate analysis to estimate grey-matter correlation between phenotypes, which revealed that body size (ie. height, weight, BMI, waist and hip circumference, body fat percentage) could account for a substantial proportion of the morphometricity (confirmed using a conditional analysis), providing possible insight into previous MRI case-control results for psychiatric disorders where case status is associated with body mass index. Our LMM framework also allowed to predict some of the associated phenotypes from the vertex-wise measures, in two independent samples. Finally, we demonstrated additional new applications of our approach: (a) region of interest (ROI) analysis that retain the vertex-wise complexity; (b) comparison of the information retained by different MRI processings.

“Randomized Experiments in Education, With Implications for Multilevel Causal Inference”, Raudenbush & Schwartz 2020

2020-raudenbush.pdf: “Randomized Experiments in Education, with Implications for Multilevel Causal Inference”⁠, Stephen W. Raudenbush, Daniel Schwartz (2020-03-01; similar):

Education research has experienced a methodological renaissance over the past two decades, with a new focus on large-scale randomized experiments. This wave of experiments has made education research an even more exciting area for statisticians, unearthing many lessons and challenges in experimental design, causal inference, and statistics more broadly. Importantly, educational research and practice almost always occur in a multilevel setting, which makes the statistics relevant to other fields with this structure, including social policy, health services research, and clinical trials in medicine. In this article we first briefly review the history that led to this new era in education research and describe the design features that dominate the modern large-scale educational experiments. We then highlight some of the key statistical challenges in this area, including endogeneity of design, heterogeneity of treatment effects, noncompliance with treatment assignment, mediation, generalizability, and spillover. Though a secondary focus, we also touch on promising trial designs that answer more nuanced questions, such as the SMART design for studying dynamic treatment regimes and factorial designs for optimizing the components of an existing treatment.

“Visual Model Fit Estimation in Scatterplots and Distribution of Attention: Influence of Slope and Noise Level”, Reimann et al 2020

2020-reimann.pdf: “Visual model fit estimation in scatterplots and distribution of attention: Influence of slope and noise level”⁠, Daniel Reimann, Christine Blech, Robert Gaschler (2020; ; similar):

Scatterplots are ubiquitous data graphs and can be used to depict how well data fit to a quantitative theory. We investigated which information is used for such estimates.

In Experiment 1 (n = 25), we tested the influence of slope and noise on perceived fit between a linear model and data points. Additionally, eye tracking was used to analyze the deployment of attention. Visual fit estimation might mimic one or the other statistical estimate: If participants were influenced by noise only, this would suggest that their subjective judgment was similar to root mean square error⁠. If slope was relevant, subjective estimation would mimic variance explained. While the influence of noise on estimated fit was stronger, we also found an influence of slope.

As most of the fixations fell into the center of the scatterplot, in Experiment 2 (n = 51), we tested whether location of noise affects judgment. Indeed, high noise influenced the judgment of fit more strongly if it was located in the middle of the scatterplot.

Visual fit estimates seem to be driven by the center of the scatterplot and to mimic variance explained.

“GPT-2 Folk Music”, Branwen & Presser 2019

GPT-2-music: “GPT-2 Folk Music”⁠, Gwern Branwen, Shawn Presser (2019-11-01; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Generating Irish/​folk/​classical music in ABC format using GPT-2-117M, with good results.

In November 2019, I experimented with training a GPT-2 neural net model to generate folk music in the high-level ABC music text format, following previous work in 2016 which used a char-RNN trained on a ‘The Session’ dataset. A GPT-2 hypothetically can improve on an RNN by better global coherence & copying of patterns, without problems with the hidden-state bottleneck.

I encountered problems with the standard GPT-2 model’s encoding of text which damaged results, but after fixing that⁠, I successfully trained it on n = 205,304 ABC music pieces taken from The Session & ABCnotation.com. The resulting music samples are in my opinion quite pleasant. (A similar model was later retrained by Geerlings & Meroño-Peñuela 2020⁠.)

The ABC folk model & dataset are available for download⁠, and I provide for listening selected music samples as well as medleys of random samples from throughout training.

We followed the ABC folk model with an ABC-MIDI model: a dataset of 453k ABC pieces decompiled from MIDI pieces, which fit into GPT-2-117M with an expanded context window when trained on TPUs⁠. The MIDI pieces are far more diverse and challenging, and GPT-2 underfits and struggles to produce valid samples but when sampling succeeds, it can generate even better musical samples⁠.

“Widespread Associations between Grey Matter Structure and the Human Phenome”, Couvy-Duchesne et al 2019

“Widespread associations between grey matter structure and the human phenome”⁠, Baptiste Couvy-Duchesne, Lachlan T. Strike, Futao Zhang, Yan Holtz, Zhili Zheng, Kathryn E. Kemper, Loic Yengo et al (2019-07-09; ; backlinks; similar):

The recent availability of large-scale neuroimaging cohorts (here the UK Biobank [UKB] and the Human Connectome Project [HCP]) facilitates deeper characterisation of the relationship between phenotypic and brain architecture variation in humans. We tested the association between 654,386 vertex-wise measures of cortical and subcortical morphology (from T1w and T2w MRI images) and behavioural, cognitive, psychiatric and lifestyle data. We found a statistically-significant association of grey-matter structure with 58 out of 167 UKB phenotypes spanning substance use, blood assay results, education or income level, diet, depression, being a twin as well as cognition domains (UKB discovery sample: n = 9,888). Twenty-three of the 58 associations replicated (UKB replication sample: n = 4,561; HCP, n = 1,110). In addition, differences in body size (height, weight, BMI, waist and hip circumference, body fat percentage) could account for a substantial proportion of the association, providing possible insight into previous MRI case-control studies for psychiatric disorders where case status is associated with body mass index. Using the same linear mixed model, we showed that most of the associated characteristics (eg. age, sex, body size, diabetes, being a twin, maternal smoking, body size) could be significantly predicted using all the brain measurements in out-of-sample prediction. Finally, we demonstrated other applications of our approach including a Region Of Interest (ROI) analysis that retain the vertex-wise complexity and ranking of the information contained across MRI processing options.

Highlights: Our linear mixed model approach unifies association and prediction analyses for highly dimensional vertex-wise MRI data

Grey-matter structure is associated with measures of substance use, blood assay results, education or income level, diet, depression, being a twin as well as cognition domains

Body size (height, weight, BMI, waist and hip circumference) is an important source of covariation between the phenome and grey-matter structure

Grey-matter scores quantify grey-matter based risk for the associated traits and allow to study phenotypes not collected

The most general cortical processing (“fsaverage” mesh with no smoothing) maximises the brain-morphometricity for all UKB phenotypes

“GPT-2 Neural Network Poetry”, Branwen & Presser 2019

GPT-2: “GPT-2 Neural Network Poetry”⁠, Gwern Branwen, Shawn Presser (2019-03-03; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Demonstration tutorial of retraining OpenAI’s GPT-2 (a text-generating Transformer neural network) on large poetry corpuses to generate high-quality English verse.

In February 2019, following up on my 2015–2016 text-generation experiments with char-RNNs⁠, I experiment with the cutting-edge Transformer NN architecture for language modeling & text generation. Using OpenAI’s GPT-2-117M (117M) model pre-trained on a large Internet corpus and nshepperd’s finetuning code, I retrain GPT-2-117M on a large (117MB) Project Gutenberg poetry corpus. I demonstrate how to train 2 variants: “GPT-2-poetry”, trained on the poems as a continuous stream of text, and “GPT-2-poetry-prefix”, with each line prefixed with the metadata of the PG book it came from. In May 2019, I trained the next-largest GPT-2, GPT-2-345M, similarly, for a further quality boost in generated poems. In October 2019, I retrained GPT-2-117M on a Project Gutenberg corpus with improved formatting, and combined it with a contemporary poem dataset based on Poetry Foundation’s website⁠.

With just a few GPU-days on 1080ti GPUs, GPT-2-117M finetuning can produce high-quality poetry which is more thematically consistent than my char-RNN poems, capable of modeling subtle features like rhyming, and sometimes even a pleasure to read. I list the many possible ways to improve poem generation and further approach human-level poems. For the highest-quality AI poetry to date, see my followup pages, “GPT-3 Creative Writing”⁠/​“GPT-3 Non-Fiction”⁠.

For anime plot summaries, see TWDNE⁠; for generating ABC-formatted folk music, see “GPT-2 Folk Music” & “GPT-2 Preference Learning for Music and Poetry Generation”⁠; for playing chess, see “A Very Unlikely Chess Game”⁠; for the Reddit comment generator, see SubSimulatorGPT-2⁠; for fanfiction, the Ao3⁠; and for video games, the walkthrough model⁠. For OpenAI’s GPT-3 followup, see “GPT-3: Language Models are Few-Shot Learners”⁠.

“Origin of ‘Littlewood’s Law of Miracles’”, Branwen 2019

Littlewood-origin: “Origin of ‘Littlewood’s Law of Miracles’”⁠, Gwern Branwen (2019-02-16; ⁠, ; backlinks; similar):

Leprechaun hunting the origins of the famous skeptical observation that because millions of events are constantly happening, ‘miracles’ happen once a month; it was actually coined by Freeman Dyson⁠.

I try to trace back “Littlewood’s Law of Miracles” to its supposed source in Littlewood’s A Mathematician’s Miscellany. It does not appear in that book, making it a leprechaun⁠, and further investigation indicates that Littlewood did not come up with it but that Freeman Dyson coined it in 2004, probably based on the earlier “Law of Truly Large Numbers” coined by Diaconis & Mosteller 1989⁠, in a case of Stigler’s law.

“Bifactor and Hierarchical Models: Specification, Inference, and Interpretation”, Markon 2019

2019-markon.pdf: “Bifactor and Hierarchical Models: Specification, Inference, and Interpretation”⁠, Kristian E. Markon (2019-01-16; ⁠, ⁠, ; backlinks; similar):

Bifactor and other hierarchical models [in factor analysis] have become central to representing and explaining observations in psychopathology, health, and other areas of clinical science, as well as in the behavioral sciences more broadly. This prominence comes after a relatively rapid period of rediscovery, however, and certain features remain poorly understood.

Here, hierarchical models are compared and contrasted with other models of superordinate structure, with a focus on implications for model comparisons and interpretation. Issues pertaining to the specification and estimation of bifactor and other hierarchical models are reviewed in exploratory as well as confirmatory modeling scenarios⁠, as are emerging findings about model fit and selection⁠.

Bifactor and other hierarchical models provide a powerful mechanism for parsing shared and unique components of variance⁠, but care is required in specifying and making inferences about them.

[Keywords: hierarchical model, higher order, bifactor, model equivalence, model complexity]

Figure 1: Hierarchical and related models. (a) Spearman’s (1904a, 1904b) 2-factor model, a precursor to hierarchical and bifactor models. The 2-factor model includes a general factor (G) as well as systematic specific factors (S) and random error factors (e). As originally formulated, Spearman’s 2-factor model cannot be estimated, but it established the idea of a superordinate general factor plus subordinate specific factors that account for systematic residual influences not accounted for by the general factor. (b) The hierarchical or bifactor model, which includes superordinate general factors (G) as well as subordinate specific factors (S); error factors are not shown. Bifactor models are a subtype of hierarchical model with one superordinate factor and multiple subordinate factors. The 2-factor model and hierarchical model are examples of top-down models, in that subordinate factors instantiate residual effects that are unexplained by the superordinate factor.

…Bifactor models are now ubiquitous in the structural modeling of psychopathology. They have been central to general factor models of psychopathology (eg. Caspi et al 2014⁠, Laceulle et al 2015⁠, Lahey et al 2012⁠, Stochl et al 2015) and have become a prominent focus in modeling a range of phenomena as diverse as internalizing psychopathology (Naragon-Gainey et al 2016), externalizing psychopathology (Krueger et al 2007), psychosis (Shevlin et al 2017), somatic-related psychopathology (Witthöft et al 2016), cognitive functioning (Frisby & Beaujean 2015), and constructs central to prominent therapeutic paradigms (Aguado et al 2015). They have also become central to modeling method effects, such as informant (Bauer et al 2013), keying (Gu et al 2017⁠, Tomas & Oliver 1999), and other effects (DeMars 2006), and they have been used to explicate fundamental elements of measurement theory (Eid et al 2017).

Although bifactor and other hierarchical models are now commonplace, this was not always so. Their current ubiquity follows a long period of relative neglect (Reise 2012), having been derived in the early 20th century (Holzinger & Harman 1938⁠, Holzinger & Swineford 1937) before being somewhat overlooked for a number of decades and then being rediscovered more recently. Bifactor models were mistakenly dismissed as equivalent to and redundant with other superordinate structural models (eg. Adcock 1964, Humphreys 1981, Wherry 1959, Reise 2012, Yung et al 1999); as differences between bifactor models and other types of superordinate structural models became more recognized (Yung et al 1999), interest in bifactor models reemerged.

Summary Points:

  1. Bifactor and other hierarchical models represent superordinate structure in terms of orthogonal general and specific factors representing distinct, non-nested components of shared variance among indicators. This contrasts with higher-order models, which represent superordinate structure in terms of specific factors that are nested in general factors, and correlated-factors models, which represent superordinate structure in terms of correlations among subordinate factors.
  2. Higher-order models can be approached as a constrained form of hierarchical models, in which direct relationships between superordinate factors and observed variables in the hierarchical model are constrained to equal the products of superordinate-subordinate paths and subordinate-observed variable paths.
  3. Multiple exploratory factor analytic approaches to the delineation of hierarchical structure are available, including rank-deficient transformations, analytic rotations, and targeted rotations. Among other things, these transformations and rotations differ in the number of factors being rotated, the nature of those factors, and how superordinate factor structures are approximated.
  4. Misspecification or under-specification of confirmatory bifactor and hierarchical models can occur for multiple reasons. Problems with model identification may occur (1) with specific patterns of homogeneity in estimated or observed covariances, (2) if factors are allowed to correlate in inadmissible ways, or (3) if covariate paths imply inadmissible correlations. Signs of model misspecification may be evident in anomalous estimates, such as loading estimates near boundaries, or estimates that are suggestive of other types of models.
  5. Common model fit statistics can overstate the fit of bifactor models due to the tendency of bifactor and other hierarchical models to overfit to data in general, regardless of plausibility or population structure. Hierarchical models are similar to exploratory factor models in their expansiveness of fit, and, in general, they are more expansive in fit than other confirmatory models.

Future Issues:

  1. Research is needed to determine how to best account for the flexibility of hierarchical models when comparing models and evaluating model fit, given that the relative flexibility of hierarchical models can only partly be accounted for by the number of parameters. Approaches based on minimum description length and related paradigms, such as Bayesian inference with reference priors⁠, are promising in this regard.
  2. More research is needed to clarify the properties of hierarchical structures when they are embedded in longitudinal models and models with covariates. As with challenges of multicollinearity in regression, parsing unique general and specific factor components of explanatory paths may be inferentially challenging in the presence of strongly related predictors, covariates, and outcomes.
  3. More can be learned about the specification and identification of hierarchical models and the relationships between hierarchical models and other types of models, such as exploratory factor models. Similarities in overfitting patterns between exploratory and hierarchical models, approaches to hierarchical structure through bifactor rotations, and patterns of anomalous estimates that are sometimes obtained with hierarchical models, point to important relationships between exploratory and hierarchical models. Further explication of model specification principles with hierarchical models would also help clarify the appropriate structures to consider when evaluating models.

“Predicting Human Inhibitory Control from Brain Structural MRI”, He et al 2019

2019-he.pdf: “Predicting human inhibitory control from brain structural MRI”⁠, Ningning He, Edmund T. Rolls, Wei Zhao, Shuixia Guo (2019-01-01; ; backlinks)

“Statistical Aspects of Wasserstein Distances”, Panaretos & Zemel 2019

2019-panaretos.pdf: “Statistical Aspects of Wasserstein Distances”⁠, Victor M. Panaretos, Yoav Zemel (2019; similar):

Wasserstein distances are metrics on probability distributions inspired by the problem of optimal mass transportation. Roughly speaking, they measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution. They are ubiquitous in mathematics, with a long history that has seen them catalyze core developments in analysis, optimization, and probability. Beyond their intrinsic mathematical richness, they possess attractive features that make them a versatile tool for the statistician: They can be used to derive weak convergence and convergence of moments, and can be easily bounded; they are well-adapted to quantify a natural notion of perturbation of a probability distribution; and they seamlessly incorporate the geometry of the domain of the distributions in question, thus being useful for contrasting complex objects. Consequently, they frequently appear in the development of statistical theory and inferential methodology, and they have recently become an object of inference in themselves. In this review, we provide a snapshot of the main concepts involved in Wasserstein distances and optimal transportation, and a succinct overview of some of their many statistical aspects.

“Finite Mixture Models”, McLachlan et al 2019

2019-mclachlan.pdf: “Finite Mixture Models”⁠, Geoffrey J. McLachlan, Sharon X. Lee, Suren I. Rathnayake (2019; similar):

The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and general scientific literature. The aim of this article is to provide an up-to-date account of the theory and methodological developments underlying the applications of finite mixture models.

Because of their flexibility, mixture models are being increasingly exploited as a convenient, semiparametric way in which to model unknown distributional shapes. This is in addition to their obvious applications where there is group-structure in the data or where the aim is to explore the data for such structure, as in a cluster analysis.

It has now been three decades since the publication of the monograph by McLachlan & Basford (1988) with an emphasis on the potential usefulness of mixture models for inference and clustering. Since then, mixture models have attracted the interest of many researchers and have found many new and interesting fields of application. Thus, the literature on mixture models has expanded enormously, and as a consequence, the bibliography here can only provide selected coverage.

“Tweedie Gradient Boosting for Extremely Unbalanced Zero-inflated Data”, Zhou et al 2018

“Tweedie Gradient Boosting for Extremely Unbalanced Zero-inflated Data”⁠, He Zhou, Yi Yang, Wei Qian (2018-11-26; ; backlinks; similar):

Tweedie’s compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution. In particular, it is not uncommon to have extremely unbalanced data with excessively large proportion of zero claims, and even traditional Tweedie model may not be satisfactory for fitting the data. In this paper, we propose a boosting-assisted zero-inflated Tweedie model, called EMTboost, that allows zero probability mass to exceed a traditional model. We makes a nonparametric assumption on its Tweedie model component, that unlike a linear model, is able to capture nonlinearities, discontinuities, and complex higher order interactions among predictors. A specialized Expectation-Maximization algorithm is developed that integrates a blockwise coordinate descent strategy and a gradient tree-boosting algorithm to estimate key model parameters. We use extensive simulation and data analysis on synthetic zero-inflated auto-insurance claim data to illustrate our method’s prediction performance.

“Raincloud Plots: a Multi-platform Tool for Robust Data Visualization”, Allen et al 2018

“Raincloud plots: a multi-platform tool for robust data visualization”⁠, Micah Allen, Davide Poggiali, Kirstie Whitaker, Tom R. Marshall, Rogier Kievit (2018-08-23; ; backlinks; similar):

Across scientific disciplines, there is a rapidly growing recognition of the need for more statistically robust, transparent approaches to data visualization. Complimentary to this, many scientists have realized the need for plotting tools that accurately and transparently convey key aspects of statistical effects and raw data with minimal distortion.

Previously common approaches, such as plotting conditional mean or median barplots together with error-bars have been criticized for distorting effect size, hiding underlying patterns in the raw data, and obscuring the assumptions upon which the most commonly used statistical tests are based.

Here we describe a data visualization approach which overcomes these issues, providing maximal statistical information while preserving the desired ‘inference at a glance’ nature of barplots and other similar visualization devices. These “raincloud plots” [scatterplots + smoothed histograms⁠/​density plot + box plots] can visualize raw data, probability density, and key summary statistics such as median, mean, and relevant confidence intervals in an appealing and flexible format with minimal redundancy.

In this tutorial paper we provide basic demonstrations of the strength of raincloud plots and similar approaches, outline potential modifications for their optimal use, and provide open-source code for their streamlined implementation in R, Python and Matlab⁠. Readers can investigate the R and Python tutorials interactively in the browser using Binder by Project Jupyter⁠.

Figure 3: Example Raincloud plot. The raincloud plot combines an illustration of data distribution (the ‘cloud’), with jittered raw data (the ‘rain’). This can further be supplemented by adding box plots or other standard measures of central tendency and error.—See figure3.Rmd for code to generate this figure.

…To remedy these shortcomings, a variety of visualization approaches have been proposed, illustrated in Figure 2, below. One simple improvement is to overlay individual observations (datapoints) beside the standard bar-plot format, typically with some degree of randomized jitter to improve visibility (Figure 2A). Complementary to this approach, others have advocated for more statistically robust illustrations such as box plots (Tukey 1970), which display sample median alongside interquartile range. Dot plots can be used to combine a histogram-like display of distribution with individual data observations (Figure 2B). In many cases, particularly when parametric statistics are used, it is desirable to plot the distribution of observations. This can reveal valuable information about how eg. some condition may increase the skewness or overall shape of a distribution. In this case, the ‘violin plot’ (Figure 2C) which displays a probability density function of the data mirrored about the uninformative axis is often preferred (Hintze & Nelson 1998). With the advent of increasingly flexible and modular plotting tools such as ggplot2 (Wickham 2010; Wickham & Chang 2008), all of the aforementioned techniques can be combined in a complementary fashion…Indeed, this combined approach is typically desirable as each of these visualization techniques have various trade-offs.

…On the other hand, the interpretation of dot plots depends heavily on the choice of dot-bin and dot-size, and these plots can also become extremely difficult to read when there are many observations. The violin plot in which the probability density function (PDF) of observations are mirrored, combined with overlaid box plots, have recently become a popular alternative. This provides both an assessment of the data distribution and statistical inference at a glance (SIG) via overlaid box plots3. However, there is nothing to be gained, statistically speaking, by mirroring the PDF in the violin plot, and therefore they are violating the philosophy of minimizing the “data-ink ratio” (Tufte 1983)4.

To overcome these issues, we propose the use of the ‘raincloud plot’ (Neuroconscience 2018), illustrated in Figure 3: The raincloud plot combines a wide range of visualization suggestions, and similar precursors have been used in various publications (eg. Ellison 1993, Figure 2.4; Wilson et al 2018). The plot attempts to address the aforementioned limitations in an intuitive, modular, and statistically robust format. In essence, raincloud plots combine a ‘split-half violin’ (an un-mirrored PDF plotted against the redundant data axis), raw jittered data points, and a standard visualization of central tendency (ie. mean or median) and error, such as a boxplot. As such the raincloud plot builds on code elements from multiple developers and scientific programming languages (Hintze & Nelson 1998; Patil 2018; Wickham & Chang 2008; Wilke 2017).

“The Simple but Ingenious System Taiwan Uses to Crowdsource Its Laws: VTaiwan Is a Promising Experiment in Participatory Governance. But Politics Is Blocking It from Getting Greater Traction”, Horton 2018

“The simple but ingenious system Taiwan uses to crowdsource its laws: vTaiwan is a promising experiment in participatory governance. But politics is blocking it from getting greater traction”⁠, Chris Horton (2018-08-21; ⁠, ⁠, ; backlinks; similar):

[Paper: Small et al 2021] That was when a group of government officials and activists decided to take the question to a new online discussion platform called vTaiwan. Starting in early March 2016, about 450 citizens went to vtaiwan.tw, proposed solutions, and voted on them…Three years after its founding, vTaiwan hasn’t exactly taken Taiwanese politics by storm. It has been used to debate only a couple of dozen bills, and the government isn’t required to heed the outcomes of those debates (though it may be if a new law passes later this year). But the system has proved useful in finding consensus on deadlocked issues such as the alcohol sales law, and its methods are now being applied to a larger consultation platform, called Join, that’s being tried out in some local government settings.

…vTaiwan relies on a hodgepodge of open-source tools for soliciting proposals, sharing information, and holding polls, but one of the key parts is Pol.is⁠, created by Megill and a couple of friends in Seattle after the events of Occupy Wall Street and the Arab Spring in 2011. On Pol.is, a topic is put up for debate. Anyone who creates an account can post comments on the topic, and can also upvote or downvote other people’s comments.

That may sound much like any other online forum, but 2 things make Pol.is unusual. The first is that you cannot reply to comments. “If people can propose their ideas and comments but they cannot reply to each other, then it drastically reduces the motivation for trolls to troll”, Tang says. “The opposing sides had never had a chance to actually interact with each other’s ideas.”

The second is that it uses the upvotes and downvotes to generate a kind of map [using PCA⁠/​UMAP for dimensionality reduction clustering] of all the participants in the debate, clustering together people who have voted similarly. Although there may be hundreds or thousands of separate comments, like-minded groups rapidly emerge in this voting map, showing where there are divides and where there is consensus. People then naturally try to draft comments that will win votes from both sides of a divide, gradually eliminating the gaps.

“The visualization is very, very helpful”, Tang says. “If you show people the face of the crowd, and if you take away the reply button, then people stop wasting time on the divisive statements.”

In one of the platform’s early successes, for example, the topic at issue was how to regulate the ride-hailing company Uber⁠, which had—as in many places around the world—run into fierce opposition from local taxi drivers. As new people joined the online debate, they were shown and asked to vote on comments that ranged from calls to ban Uber or subject it to strict regulation, to calls to let the market decide, to more general statements such as “I think that Uber is a business model that can create flexible jobs.”

Within a few days, the voting had coalesced to define 2 groups, one pro-Uber and one, about twice as large, anti-Uber. But then the magic happened: as the groups sought to attract more supporters, their members started posting comments on matters that everyone could agree were important, such as rider safety and liability insurance. Gradually, they refined them to garner more votes. The end result was a set of 7 comments that enjoyed almost universal approval, containing such recommendations as “The government should set up a fair regulatory regime”, “Private passenger vehicles should be registered”, and “It should be permissible for a for-hire driver to join multiple fleets and platforms.” The divide between pro-Uber and anti-Uber camps had been replaced by consensus on how to create a level playing field for Uber and the taxi firms, protect consumers, and create more competition. Tang herself took those suggestions into face-to-face talks with Uber, the taxi drivers, and experts, which led the government to adopt new regulations along the lines vTaiwan had produced.

Jason Hsu, a former activist, and now an opposition legislator, helped bring the vTaiwan platform into being. He says its big flaw is that the government is not required to heed the discussions taking place there. vTaiwan’s website boasts that as of August 2018, it had been used in 26 cases, with 80% resulting in “decisive government action.” As well as inspiring regulations for Uber and for online alcohol sales, it has led to an act that creates a “fintech sandbox”, a space for small-scale technological experiments within Taiwan’s otherwise tightly regulated financial system.

“It’s all solving the same problem: essentially saying, ‘What if we’re talking about things that are emergent, [for which] there are only a handful of early adopters?’” Tang says. “That’s the basic problem we were solving at the very beginning with vTaiwan.”

“Phenomic Selection: a Low-cost and High-throughput Alternative to Genomic Selection”, Rincent et al 2018

“Phenomic selection: a low-cost and high-throughput alternative to genomic selection”⁠, Renaud Rincent, Jean-Paul Charpentier, Patricia Faivre-Rampant, Etienne Paux, Jacques Le Gouis, Catherine Bastien et al (2018-04-16; ; backlinks; similar):

Genomic selection—the prediction of breeding values using DNA polymorphisms—is a disruptive method that has widely been adopted by animal and plant breeders to increase crop, forest and livestock productivity and ultimately secure food and energy supplies. It improves breeding schemes in different ways, depending on the biology of the species and genotyping and phenotyping constraints. However, both genomic selection and classical phenotypic selection remain difficult to implement because of the high genotyping and phenotyping costs that typically occur when selecting large collections of individuals, particularly in early breeding generations.

To specifically address these issues, we propose a new conceptual framework called phenomic selection, which consists of a prediction approach based on low-cost and high-throughput phenotypic descriptors rather than DNA polymorphisms. We applied phenomic selection on two species of economic interest (wheat and poplar) using near-infrared spectroscopy on various tissues. We showed that one could reach accurate predictions in independent environments for developmental and productivity traits and tolerance to disease. We also demonstrated that under realistic scenarios, one could expect much higher genetic gains with phenomic selection than with genomic selection.

Our work constitutes a proof of concept and is the first attempt at phenomic selection; it clearly provides new perspectives for the breeding community, as this approach is theoretically applicable to any organism and does not require any genotypic information.

“Intact Connectional Morphometricity Learning Using Multi-view Morphological Brain Networks With Application to Autism Spectrum Disorder”, Bessadok & Rekik 2018

2018-bessadok.pdf: “Intact Connectional Morphometricity Learning Using Multi-view Morphological Brain Networks with Application to Autism Spectrum Disorder”⁠, Alaa Bessadok, Islem Rekik (2018-01-01; ; backlinks)

“We Are Not Alone! (at Least, Most of Us): Homonymy in Large Scale Social Groups”, Charpentier & Coulmont 2017

“We are not alone! (at least, most of us): Homonymy in large scale social groups”⁠, Arthur Charpentier, Baptiste Coulmont (2017-07-24; backlinks; similar):

This article brings forward an estimation of the proportion of homonyms in large scale groups based on the distribution of first names and last names in a subset of these groups.

The estimation is based on the generalization of the “birthday paradox problem”.

The main results is that, in societies such as France or the United States, identity collisions (based on first + last names) are frequent. The large majority of the population has at least one homonym. But in smaller settings, it is much less frequent: even if small groups of a few thousand people have at least one couple of homonyms, only a few individuals have an homonym.

“Long Bets As Charitable Giving Opportunity”, Branwen 2017

Long-Bets: “Long Bets as Charitable Giving Opportunity”⁠, Gwern Branwen (2017-02-24; ⁠, ⁠, ⁠, ; backlinks; similar):

Evaluating Long Bets as a prediction market shows it is dysfunctional and poorly-structured; despite the irrationality of many users, it is not good even as a way to raise money for charity.

Long Bets is a 15-year-old real-money prediction market run by the Long Now Foundation for incentivizing forecasts/​bets about long-term events of social importance such as technology or the environment. I evaluate use of Long Bets as a charitable giving opportunity by winning bets and directing the earnings to a good charity by making forecasts for all available bet opportunities and ranking them by expected value after adjusting for opportunity cost (defined by expected return of stock market indexing) and temporally discounting. I find that while there are ~41 open bets which I expect have positive expected value if counter-bets were accepted, few or none of my counter-bets were accepted. In general, LB has had almost zero activity for the past decade, and has not incentivized much forecasting. This failure is likely caused by its extreme restriction to even-odds bets, no return on bet funds (resulting in enormous opportunity costs), and lack of maintenance or publicity. All of these issues are highly likely to continue barring extensive changes to Long Bets, and I suggest that Long Bets should be wound down.

“Morphometricity As a Measure of the Neuroanatomical Signature of a Trait”, Sabuncu et al 2016

“Morphometricity as a measure of the neuroanatomical signature of a trait”⁠, Mert R. Sabuncu, Tian Ge, Avram J. Holmes, Jordan W. Smoller, Randy L. Buckner, Bruce Fischl, the Alzheimer’s Disease Neuroimaging Initiative et al (2016-09-09; ; backlinks; similar):

Neuroimaging has largely focused on 2 goals: mapping associations between neuroanatomical features and phenotypes and building individual-level prediction models. This paper presents a complementary analytic strategy called morphometricity that aims to measure the neuroanatomical signatures of different phenotypes.

Inspired by prior work on [genetic] heritability, we define morphometricity as the proportion of phenotypic variation that can be explained by brain morphology (eg. as captured by structural brain MRI). In the dawning era of large-scale datasets comprising traits across a broad phenotypic spectrum, morphometricity will be critical in prioritizing and characterizing behavioral, cognitive, and clinical phenotypes based on their neuroanatomical signatures. Furthermore, the proposed framework will be important in dissecting the functional, morphological, and molecular underpinnings of different traits.

…Complex physiological and behavioral traits, including neurological and psychiatric disorders, often associate with distributed anatomical variation. This paper introduces a global metric, called morphometricity, as a measure of the anatomical signature of different traits. Morphometricity is defined as the proportion of phenotypic variation that can be explained by macroscopic brain morphology.

We estimate morphometricity via a linear mixed-effects model that uses an anatomical similarity matrix computed based on measurements derived from structural brain MRI scans. We examined over 3,800 unique MRI scans from 9 large-scale studies to estimate the morphometricity of a range of phenotypes, including clinical diagnoses such as Alzheimer’s disease, and nonclinical traits such as measures of cognition.

Our results demonstrate that morphometricity can provide novel insights about the neuroanatomical correlates of a diverse set of traits, revealing associations that might not be detectable through traditional statistical techniques.

[Keywords: neuroimaging, brain morphology, statistical association]

“CO2/ventilation Sleep Experiment”, Branwen 2016

CO2: “CO2/ventilation sleep experiment”⁠, Gwern Branwen (2016-06-05; ⁠, ⁠, ⁠, ; backlinks; similar):

Self-experiment on whether changes in bedroom CO2 levels affect sleep quality

Some psychology studies find that CO2 impairs cognition, and some sleep studies find that better ventilation may improve sleep quality. Use of a Netatmo air quality sensor reveals that closing my bedroom tightly to reduce morning light also causes CO2 levels to spike overnight to 7x daytime levels. To investigate the possible harmful effects, I run a self-experiment randomizing an open bedroom door and a bedroom box fan (2x2) and analyze the data using a structural equation model of air quality effects on a latent sleep factor with measurement error⁠.

“Exploring Factor Model Parameters across Continuous Variables With Local Structural Equation Models”, Hildebrandt et al 2016

2016-hildebrandt.pdf: “Exploring Factor Model Parameters across Continuous Variables with Local Structural Equation Models”⁠, Andrea Hildebrandt, Oliver Lüdtke, Alexander Robitzsch, Christopher Sommer, Oliver Wilhelm (2016-04-06; similar):

Using an empirical data set, we investigated variation in factor model parameters across a continuous moderator variable and demonstrated three modeling approaches: multiple-group mean and covariance structure (MGMCS) analyses, local structural equation modeling (LSEM), and moderated factor analysis (MFA). We focused on how to study variation in factor model parameters as a function of continuous variables such as age, socioeconomic status⁠, ability levels, acculturation, and so forth. Specifically, we formalized the LSEM approach in detail as compared with previous work and investigated its statistical properties with an analytical derivation and a simulation study. We also provide code for the easy implementation of LSEM. The illustration of methods was based on cross-sectional cognitive ability data from individuals ranging in age from 4 to 23 years. Variations in factor loadings across age were examined with regard to the age differentiation hypothesis. LSEM and MFA converged with respect to the conclusions. When there was a broad age range within groups and varying relations between the indicator variables and the common factor across age, MGMCS produced distorted parameter estimates. We discuss the pros of LSEM compared with MFA and recommend using the two tools as complementary approaches for investigating moderation in factor model parameters.

[Keywords: Local structural equation model, moderated factor analysis, multiple-group mean and covariance structures, age differentiation of cognitive abilities, WJ-III tests of cognitive abilities]

“Do Scholars Follow Betteridge’s Law? The Use of Questions in Journal Article Titles”, Cook & Plourde 2016

2016-cook.pdf: “Do scholars follow Betteridge’s Law? The use of questions in journal article titles”⁠, James M. Cook, Dawn Plourde (2016-01-01)

“RNN Metadata for Mimicking Author Style”, Branwen 2015

RNN-metadata: “RNN Metadata for Mimicking Author Style”⁠, Gwern Branwen (2015-09-12; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Teaching a text-generating char-RNN to automatically imitate many different authors by labeling the input text by author; additional experiments include imitating Geocities and retraining GPT-2 on a large Project Gutenberg poetry corpus.

Char-RNNs are unsupervised generative models which learn to mimic text sequences. I suggest extending char-RNNs with inline metadata such as genre or author prefixed to each line of input, allowing for better & more efficient metadata, and more controllable sampling of generated output by feeding in desired metadata. A 2015 experiment using torch-rnn on a set of ~30 Project Gutenberg e-books (1 per author) to train a large char-RNN shows that a char-RNN can learn to remember metadata such as authors, learn associated prose styles, and often generate text visibly similar to that of a specified author.

I further try & fail to train a char-RNN on Geocities HTML for unclear reasons.

More successfully, I experiment in 2019 with a recently-developed alternative to char-RNNs⁠, the Transformer NN architecture, by finetuning training OpenAI’s GPT-2-117M Transformer model on a much larger (117MB) Project Gutenberg poetry corpus using both unlabeled lines & lines with inline metadata (the source book). The generated poetry is much better. And GPT-3 is better still.

“LWer Effective Altruism Donations, 2013–2014”, Branwen 2015

EA-donations: “LWer Effective Altruism donations, 2013–2014”⁠, Gwern Branwen (2015-05-12; ⁠, ⁠, ; backlinks; similar):

Analysis of 2013-2014 LessWrong survey results on how much more self-identified EAers donate suggests low median donation rates due to current youth and low incomes.

A LW critic noted that the annual LW survey reported a median donation for “effective altruists” of $0, though the EA movement encourages strongly donations. I look closer at the 2013-2014 LW surveys and find in multiple regression that identifying as an EA does predict more donations after controlling for age and income, suggesting that the low EA median donation may be due to EAers having low income and youth (eg. being a student) rather than being unusually or even averagely selfish.

“Reflections on How Designers Design With Data”, Bigelow et al 2014

2014-bigelow.pdf: “Reflections on How Designers Design with Data”⁠, Alex Bigelow, Steven Mark Drucker, Danyel Fisher, Miriah D. Meyer (2014-05-27; ; backlinks; similar):

In recent years many popular data visualizations have emerged that are created largely by designers whose main area of expertise is not computer science. Designers generate these visualizations using a handful of design tools and environments. To better inform the development of tools intended for designers working with data, we set out to understand designers’ challenges and perspectives.

We interviewed professional designers, conducted observations of designers working with data in the lab, and observed designers working with data in team settings in the wild.

A set of patterns emerged from these observations from which we extract a number of themes that provide a new perspective on design considerations for visualization tool creators, as well as on known engineering problems.

Patterns: In our observational studies we observed all of the designers initially sketching visual representations of data on paper, on a whiteboard, or in Illustrator. In these sketches, the designers would first draw high-level elements of their design such as the layout and axes, followed by a sketching in of data points based on their perceived ideas of data behavior (P1). An example is shown in Figure 3. The designers often relied on their understanding of the semantics of data to infer how the data might look, such as F1 anticipating that Fitbit data about walking would occur in short spurts over time while sleep data would span longer stretches. However, the designers’ inferences about data behavior were often inaccurate (P2). This tendency was acknowledged by most of the designers: after her inference from data semantics, F1 indicated that to work effectively, she would need “a better idea of the behavior of each attribute.” Similarly, B1 did not anticipate patterns in how software bugs are closed, prompting a reinterpretation and redesign of her team’s visualization much later in the design process once data behavior was explicitly explored. In the time travel studies, T3 misinterpreted one trip that later caused a complete redesign.

Furthermore, the designers’ inferences about data structure were often separated from the actual data (P3). In brainstorming sessions at the hackathon, the designers described data that would be extremely difficult or impossible to gather or derive. In working with the HBO dataset, H1 experienced frustration after he spent time writing a formula in Excel only to realize that he was recreating data he had already seen in the aggregate table…Not surprisingly, the amount of data exploration and manipulation was related to the level of a designer’s experience working with data (P4).

Past, Present, and Future of Statistical Science [COPSS 50th Anniversary Anthology]”, Lin et al 2014

2014-copss-pastpresentfuturestatistics.pdf: Past, Present, and Future of Statistical Science [COPSS 50th anniversary anthology]”⁠, Xihong Lin, Christian Genest, David L. Banks, Geert Molenberghs, David W. Scott, Jane-Ling Wang (2014-03-26; backlinks; similar):

Past, Present, and Future of Statistical Science was commissioned in 2013 by the Committee of Presidents of Statistical Societies (COPSS) to celebrate its 50th anniversary and the International Year of Statistics. COPSS consists of five charter member statistical societies in North America and is best known for sponsoring prestigious awards in statistics, such as the COPSS Presidents’ award. Through the contributions of a distinguished group of 50 statisticians who are past winners of at least one of the five awards sponsored by COPSS, this volume showcases the breadth and vibrancy of statistics, describes current challenges and new opportunities, highlights the exciting future of statistical science, and provides guidance to future generations of statisticians. The book is not only about statistics and science but also about people and their passion for discovery. Distinguished authors present expository articles on a broad spectrum of topics in statistical education, research, and applications. Topics covered include reminiscences and personal reflections on statistical careers, perspectives on the field and profession, thoughts on the discipline and the future of statistical science, and advice for young statisticians. Many of the articles are accessible not only to professional statisticians and graduate students but also to undergraduate students interested in pursuing statistics as a career and to all those who use statistics in solving real-world problems. A consistent theme of all the articles is the passion for statistics enthusiastically shared by the authors. Their success stories inspire, give a sense of statistics as a discipline, and provide a taste of the exhilaration of discovery, success, and professional accomplishment.

“This collection of reminiscences, musings on the state of the art, and advice for young statisticians makes for compelling reading. There are 52 contributions from eminent statisticians who have won a Committee of Presidents of Statistical Societies award. Each is a short, focused chapter and so one could even say this is ideal bedtime (or coffee break) reading. Anyone interested in the history of statistics will know that much has been written about the early days but little about the field since the Second World War. This book goes some way to redress this and is all the more valuable for coming from the horse’s mouth…the closing chapter, the shortest of all, from Brad Efron: a list of”thirteen rules for giving a really bad talk”. This made me laugh out loud and should be posted on the walls of all conferences. I shall leave the final word to Peter Bickel: “We should glory in this time when statistical thinking pervades almost every field of endeavor. It is really a lot of fun.”

―Robert Grant, in Significance, April 2017

The History of COPSS: “A brief history of the Committee of Presidents of Statistical Societies (COPSS)”, Ingram Olkin

Reminiscences and Personal Reflections on Career Paths

“Reminiscences of the Columbia University Department of Mathematical Statistics in the late 1940s”, Ingram Olkin · “A career in statistics”, Herman Chernoff · “. . . how wonderful the field of statistics is . . .”, David R. Brillinger · “An unorthodox journey to statistics: Equity issues, remarks on multiplicity”, Juliet Popper Shaffer · “Statistics before and after my COPSS Prize”, Peter J. Bickel · “The accidental biostatistics professor”, Donna Brogan · “Developing a passion for statistics”, Bruce G. Lindsay · “Reflections on a statistical career and their implications”, R. Dennis Cook · “Science mixes it up with statistics”, Kathryn Roeder · “Lessons from a twisted career path”, Jeffrey S. Rosenthal · “Promoting equity”, Mary Gray

Perspectives on the Field and Profession

“Statistics in service to the nation”, Stephen E. Fienberg · “Where are the majors?”, Iain M. Johnstone · “We live in exciting times”, Peter Hall · “The bright future of applied statistics”, Rafael A. Irizarry · “The road travelled: From a statistician to a statistical scientist”, Nilanjan Chatterjee · “Reflections on a journey into statistical genetics and genomics”, Xihong Lin · “Reflections on women in statistics in Canada”, Mary E. Thompson · “The whole women thing”, Nancy Reid · “Reflections on diversity”, Louise Ryan

Reflections on the Discipline

“Why does statistics have two theories?”, Donald A. S. Fraser · “Conditioning is the issue”, James O. Berger · “Statistical inference from a Dempster-Shafer perspective”, Arthur P. Dempster · “Nonparametric Bayes”, David B. Dunson · “How do we choose our default methods?”, Andrew Gelman · “Serial correlation and Durbin-Watson bounds”, T. W. Anderson · “A non-asymptotic walk in probability and statistics”, Pascal Massart · “The past’s future is now: What will the present’s future bring?”, Lynne Billard · “Lessons in biostatistics”, Norman E. Breslow · “A vignette of discovery”, Nancy Flournoy · “Statistics and public health research”, Ross L. Prentice · “Statistics in a new era for finance and health care”, Tze Leung Lai · “Meta-analyses: Heterogeneity can be a good thing”, Nan M. Laird · “Good health: Statistical challenges in personalizing disease prevention”, Alice S. Whittemore · “Buried treasures”, Michael A. Newton · “Survey sampling: Past controversies, current orthodoxy, future paradigms”, Roderick J. A. Little · “Environmental informatics: Uncertainty quantification in the environmental sciences”, Noel A. Cressie · “A journey with statistical genetics”, Elizabeth Thompson · “Targeted learning: From MLE to TMLE”, Mark van der Laan · “Statistical model building, machine learning, and the ah-ha moment”, Grace Wahba · “In praise of sparsity and convexity”, Robert J. Tibshirani · “Features of Big Data and sparsest solution in high confidence set”, Jianqing Fan · “Rise of the machines”, Larry A. Wasserman · “A trio of inference problems that could win you a Nobel Prize in statistics (if you help fund it)”, Xiao-Li Meng

Advice for the Next Generation

“Inspiration, aspiration, ambition”, C. F. Jeff Wu · “Personal reflections on the COPSS Presidents’ Award”, Raymond J. Carroll · “Publishing without perishing and other career advice”, Marie Davidian · “Converting rejections into positive stimuli”, Donald B. Rubin · “The importance of mentors”, Donald B. Rubin · “Never ask for or give advice, make mistakes, accept mediocrity, enthuse”, Terry Speed · “Thirteen rules”, Bradley Efron

“2013 LLLT Self-experiment”, Branwen 2013

LLLT: “2013 LLLT self-experiment”⁠, Gwern Branwen (2013-12-20; ⁠, ⁠, ; backlinks; similar):

An LLLT user’s blinded randomized self-experiment in 2013 on the effects of near-infrared light on a simple cognitive test battery: positive results

A short randomized & blinded self-experiment on near-infrared LED light stimulation of one’s brain yields statistically-significant dose-related improvements to 4 measures of cognitive & motor performance. Concerns include whether the blinding succeeded and why the results are so good.

“Darknet Market Mortality Risks”, Branwen 2013

DNM-survival: “Darknet Market mortality risks”⁠, Gwern Branwen (2013-10-30; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Survival analysis of lifespans, deaths, and predictive factors of Tor-Bitcoin darknet markets

I compile a dataset of 87 public English-language darknet markets (DNMs) 2011–2016 in the vein of the famous Silk Road 1⁠, recording their openings/​closing and relevant characteristics. A survival analysis indicates the markets follow a Type TODO lifespan, with a median life of TODO months. Risk factors include TODO. With the best model, I generate estimates for the currently-operating markets.

“Psychological Measurement and Methodological Realism”, Hood 2013

2013-hood.pdf: “Psychological Measurement and Methodological Realism”⁠, S. Brian Hood (2013-08-01)

“2013 Lewis Meditation Results”, Branwen 2013

Lewis-meditation: “2013 Lewis meditation results”⁠, Gwern Branwen (2013-07-12; ⁠, ⁠, ; backlinks; similar):

Multilevel modeling of effect of small group’s meditation on math errors

A small group of Quantified Selfers tested themselves daily on arithmetic and engaged in a month of meditation. I analyze their scores with a multilevel model with per-subject grouping, and find the expect result: a small decrease in arithmetic errors which is not statistically-significant⁠, with practice & time-of-day effects (but not day-of-week or weekend effects). This suggests a longer experiment by twice as many experimenters in order to detect this effect.

“Weather and My Productivity”, Branwen 2013

Weather: “Weather and My Productivity”⁠, Gwern Branwen (2013-03-19; ⁠, ⁠, ⁠, ; backlinks; similar):

Rain or shine affect my mood? Not much.

Weather is often said to affect our mood, and that people in sunnier places are happier because of that. Curious about the possible effect (it could be worth controlling for in my future QS analyses or attempting to imitate benefits inside my house eg. brighter lighting), I combine my long-term daily self-ratings with logs from the nearest major official weather stations, which offer detailed weather information about temperature, humidity, precipitation, cloud cover, wind speed, brightness etc, and try to correlate them.

In general, despite considerable data, there are essentially no bivariate correlations, nothing in several versions of a linear model, and nothing found by a random forest⁠. It would appear that weather does not correlate with my self-ratings to any detectable degree, much less cause it.

“When Graphics Improve Liking but Not Learning from Online Lessons”, Sung & Mayer 2012

2012-sung.pdf: “When graphics improve liking but not learning from online lessons”⁠, Eunmo Sung, Richard E. Mayer (2012-09-01; ; similar):

  • Added instructive, decorative, seductive photos or none to an online lesson.
  • Higher satisfaction ratings for all 3 kinds of photos.
  • Higher recall test scores for instructive photos only.
  • Adding relevant photos helps learning, but adding irrelevant photos does not.

The multimedia principle states that adding graphics to text can improve student learning (Mayer 2009), but all graphics are not equally effective.

In the present study, students studied a short online lesson on distance education that contained instructive graphics (ie. directly relevant to the instructional goal), seductive graphics (ie. highly interesting but not directly relevant to the instructional goal), decorative graphics (ie. neutral but not directly relevant to the instructional goal), or no graphics.

After instruction, students who received any kind of graphic produced statistically-significantly higher satisfaction ratings than the no graphics group, indicating that adding any kind of graphic greatly improves positive feelings. However, on a recall posttest, students who received instructive graphics performed statistically-significantly better than the other 3 groups, indicating that the relevance of graphics affects learning outcomes. The 3 kinds of graphics had similar effects on affective measures but different effects on cognitive measures.

Thus, the multimedia effect is qualified by a version of the coherence principle: Adding relevant graphics to words helps learning but adding irrelevant graphics does not.

[Keywords: graphics, seductive details, e-Learning, web-based learning, multimedia effect, multimedia learning]

“Confidence Intervals for the Weighted Sum of Two Independent Binomial Proportions”, Decrouez & Robinson 2012

2012-decrouez.pdf: “Confidence Intervals for the Weighted Sum of Two Independent Binomial Proportions”⁠, Geoffrey Decrouez, Andrew P. Robinson (2012-09-01; similar):

Confidence intervals for the difference of two binomial proportions are well known, however, confidence intervals for the weighted sum of two binomial proportions are less studied. We develop and compare 7 methods for constructing confidence intervals for the weighted sum of 2 independent binomial proportions. The interval estimates are constructed by inverting the Wald test, the score test and the Likelihood ratio test. The weights can be negative, so our results generalize those for the difference between two independent proportions. We provide a numerical study that shows that these confidence intervals based on large-sample approximations perform very well, even when a relatively small amount of data is available. The intervals based on the inversion of the score test showed the best performance. Finally, we show that as for the difference of two binomial proportions, adding four pseudo-outcomes to the Wald interval for the weighted sum of two binomial proportions improves its coverage substantially, and we provide a justification for this correction.

[Keywords: border security, leakage survey, likelihood ratio test, quarantine inspection, score test, small sample, sum of proportions, Wald test]

“The Remarkable, yet Not Extraordinary, Human Brain As a Scaled-up Primate Brain and Its Associated Cost”, Herculano-Houzel 2012

2012-herculanohouzel.pdf: “The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost”⁠, Suzana Herculano-Houzel (2012-06-19; ⁠, ⁠, ; backlinks; similar):

[Herculano-Houzel 2009] Neuroscientists have become used to a number of “facts” about the human brain: It has 100 billion neurons and 10- to 50-fold more glial cells; it is the largest-than-expected for its body among primates and mammals in general, and therefore the most cognitively able; it consumes an outstanding 20% of the total body energy budget despite representing only 2% of body mass because of an increased metabolic need of its neurons; and it is endowed with an overdeveloped cerebral cortex, the largest compared with brain size.

These facts led to the widespread notion that the human brain is literally extraordinary: an outlier among mammalian brains, defying evolutionary rules that apply to other species, with an uniqueness seemingly necessary to justify the superior cognitive abilities of humans over mammals with even larger brains. These facts, with deep implications for neurophysiology and evolutionary biology, are not grounded on solid evidence or sound assumptions, however.

Our recent development of a method that allows rapid and reliable quantification of the numbers of cells that compose the whole brain has provided a means to verify these facts. Here, I review this recent evidence and argue that, with 86 billion neurons and just as many nonneuronal cells, the human brain is a scaled-up primate brain in its cellular composition and metabolic cost, with a relatively enlarged cerebral cortex that does not have a relatively larger number of brain neurons yet is remarkable in its cognitive abilities and metabolism simply because of its extremely large number of neurons.

“Treadmill Desk Observations”, Branwen 2012

Treadmill: “Treadmill desk observations”⁠, Gwern Branwen (2012-06-19; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Notes relating to my use of a treadmill desk and 2 self-experiments showing walking treadmill use interferes with typing and memory performance.

It has been claimed that doing spaced repetition review while on a walking treadmill improves memory performance. I did a randomized experiment August 2013 – May 2014 and found that using a treadmill damaged my recall performance.

“LW Anchoring Experiment”, Branwen 2012

Anchoring: “LW anchoring experiment”⁠, Gwern Branwen (2012-02-27; ⁠, ⁠, ; similar):

Do mindless positive/​negative comments skew article quality ratings up and down?

I do an informal experiment testing whether LessWrong karma scores are susceptible to a form of anchoring based on the first comment posted; a medium-large effect size is found. Although the data does not fit the assumed normal distribution so there may or may not be any actual anchoring effect.

“The Changing History of Robustness”, Stigler 2010

2010-stigler.pdf: “The Changing History of Robustness”⁠, Stephen M. Stigler (2010-08-01)

“Continuous Parameter Estimation Model: Expanding the Standard Statistical Paradigm”, Gorsuch 2005

2005-gorsuch.pdf: “Continuous Parameter Estimation Model: Expanding the Standard Statistical Paradigm”⁠, Richard L. Gorsuch (2005-01-01)

“William Sealy Gosset”, Fienberg & Lazar 2001

2001-fienberg.pdf: “William Sealy Gosset”⁠, Stephen E. Fienberg, Nicole Lazar (2001-01-01)

“The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines”, Bradlow & Schmittlein 2000

2000-bradlow.pdf: “The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines”⁠, Eric T. Bradlow, David C. Schmittlein (2000-02-01; ⁠, ; backlinks; similar):

This research examines the ability of 6 popular Web search engines [AltaVista⁠, Northern Light⁠, Infoseek⁠, Excite⁠, HotBot⁠/​Lycos], individually and collectively, to locate Web pages containing common marketing/​management phrases. We propose and validate a model for search engine performance that is able to represent key patterns of coverage and overlap among the engines.

The model enables us to estimate the typical additional benefit of using multiple search engines, depending on the particular set of engines being considered. It also provides an estimate of the number of relevant Web pages not found by any of the engines. For a typical marketing/​management phrase we estimate that the “best” search engine locates about 50% of the pages, and all 6 engines together find about 90% of the total.

The model is also used to examine how properties of a Web page and characteristics of a phrase affect the probability that a given search engine will find a given page. For example, we find that the number of Web page links increases the prospect that each of the 6 search engines will find it. Finally, we summarize the relationship between major structural characteristics of a search engine and its performance in locating relevant Web pages.

[Keywords: capture-recapture⁠, hierarchical Bayes⁠, marketing information, probability models, World Wide Web]

…Overall, based on the Model 3 estimates in Table 8 (and consistent with Table 1), we can make 5 simple statements concerning the “best engine question”:

  1. Overall, for a randomly chosen marketing phrase and URL, the search engine most likely to find it is AltaVista.
  2. But, Northern Light is a very close second and, in fact, does slightly better than AltaVista in finding managerial phrases.
  3. HotBot is a very respectable third, locating a little over 50%–60% as many URLs as AltaVista or Northern Light.
  4. Excite and Infoseek trail more substantially, locating 20%–30% as many documents as the 2 leading engines.
  5. Lycos found 10%–15% as many documents as the 2 leaders

“THE ENTROPY OF ENGLISH USING PPM-BASED MODELS - Data Compression Conference, 1996. DCC '96. Proceedings”

1996-teahan.pdf: “THE ENTROPY OF ENGLISH USING PPM-BASED MODELS - Data Compression Conference, 1996. DCC '96. Proceedings” (1996-01-01; ; backlinks)

“Error Rates in Quadratic Discrimination With Constraints on the Covariance Matrices”, Flury et al 1994

1994-flury.pdf: “Error rates in quadratic discrimination with constraints on the covariance matrices”⁠, Bernhard W. Flury, Martin J. Schmid, A. Narayanan (1994-03-01; backlinks; similar):

In multivariate discrimination of several normal populations, the optimal classification procedure is based on quadratic discriminant functions⁠.

We compare expected error rates of the quadratic classification procedure if the covariance matrices are estimated under the following 4 models: (1) arbitrary covariance matrices, (2) common principal components, (3) proportional covariance matrices, and (4) identical covariance matrices.

Using Monte Carlo simulation to estimate expected error rates, we study the performance of the 4 discrimination procedures for 5 different parameter setups corresponding to “standard” situations that have been used in the literature. The procedures are examined for sample sizes ranging from 10 to 60, and for 2 to 4 groups.

Our results quantify the extent to which a parsimonious method reduces error rates, and demonstrate that choosing a simple method of discrimination is often beneficial even if the underlying model assumptions are wrong.

[Keywords: Common principal components⁠, Linear Discriminant Function, Monte Carlo simulation, Proportional Covariance Matrices]

“Another Comment on O’Cinneide”, Mallows 1991

1990-mallows.pdf: “Another comment on O’Cinneide”⁠, Colin Mallows (1991-08-01; backlinks)

“The Mean Is within One Standard Deviation of Any Median”, O’cinneide 1990

1990-ocinneide.pdf: “The Mean is within One Standard Deviation of Any Median”⁠, Colm Art O’cinneide (1990-01-01; backlinks)

“Statistics in Britain 1865-1930: The Social Construction of Scientific Knowledge”, MacKenzie & A. 1981

1981-mackenzie-statisticsinbritain18651930.pdf: “Statistics in Britain 1865-1930: The Social Construction of Scientific Knowledge”⁠, MacKenzie, Donald A. (1981-01-01)

“On Rereading R. A. Fisher [Fisher Memorial Lecture, With Comments]”, Savage et al 1976-page-46

1976-savage.pdf#page=46: “On Rereading R. A. Fisher [Fisher Memorial lecture, with comments]”⁠, Leonard J. Savage, John Pratt, Bradley Efron, Churchill Eisenhart, Bruno de Finetti, D. A. S. Fraser et al (1976-01-01; backlinks)

“Social Statistics”, Blalock & Jr. 1960

1960-blalock-socialstatistics.pdf: “Social Statistics”⁠, Hubert M. Blalock, Jr. (1960-01-01)

“'Student' and Small Sample Theory”, Welch 1958

1958-welch.pdf: “'Student' and Small Sample Theory”⁠, B. L. Welch (1958-01-01)

“The Development of Hierarchical Factor Solutions”, Schid & Leiman 1957

1957-schmid.pdf: “The development of hierarchical factor solutions”⁠, John Schid, John M. Leiman (1957-03-01; ; backlinks; similar):

Although simple structure has proved to be a valuable principle for rotation of axes in factor analysis⁠, an oblique factor solution often tends to confound the resulting interpretation.

A model [Schmid-Leiman transformation] is presented here which transforms the oblique factor solution so as to preserve simple structure and, in addition, to provide orthogonal reference axes. Furthermore, this model makes explicit the hierarchical ordering of factors above the first-order domain.

“On the Non-Existence of Tests of "Student’s" Hypothesis Having Power Functions Independent of Σ”, Dantzig 1940

1940-dantzig.pdf: “On the Non-Existence of Tests of "Student’s" Hypothesis Having Power Functions Independent of σ”⁠, George B. Dantzig (1940-06-01)

“Professor Ronald Aylmer Fisher [profile]”, Mahalanobis 1938

1938-mahalanobis.pdf: “Professor Ronald Aylmer Fisher [profile]”⁠, P. C. Mahalanobis (1938-01-01)

“The Method of Path Coefficients”, Wright 1934

1934-wright.pdf: “The Method of Path Coefficients”⁠, Sewall Wright (1934-01-01)

“Correlation Calculated from Faulty Data”, Spearman 1910

1910-spearman.pdf: “Correlation Calculated from Faulty Data”⁠, Charles Spearman (1910-01-01)

“Some Experimental Results in the Correlation of Mental Abilities”, Brown 1910

1910-brown.pdf: “Some Experimental Results in the Correlation of Mental Abilities”⁠, William Brown (1910-01-01)

“The Proof and Measurement of Association between Two Things”, Spearman 1904

1904-spearman.pdf: “The proof and measurement of association between two things”⁠, Charles Spearman (1904; ; backlinks; similar):

[Attempted to study the scientific correlation between 2 things. Any correlational experiment can only be regarded as a sample and presents a certain amount of accidental deviation from the real general tendency. Accidental deviation can be measured by the ‘probable error’. Accidental deviation depends on the number or cases, and on the largeness of existing correspondence. Probable error varies according to the method of calculation. While the number of Subjects helps to reduce accidental deviation, it has no effect upon systematic deviation, except that it indirectly leads to an augmentation. Therefore, the number of cases should be determined by the principle that the measurements to be aggregated together should have their error brought to the same general order of magnitude. Suggests that probable errors must be kept down to limits small enough for the particular object of investigation to be proved.


Early in the 20th century, Spearman published several articles in the American Journal of Psychology on experimental methodology in general and the method of correlation in particular. They are all-important and seminal works. The following article, “The proof and measurement of association between 2 things”, is important because Spearman published it as “a commencement at attempting to remedy” a problem in the experimentation of his day in which “laborious series of experiments are executed and published with the purpose of demonstrating some connection between 2 events” but in which experimental importance is not ascertainable.

This article is Spearman’s explication of the adaptation of Pearson’s product-moment correlation statistic to experimental results in psychology. It should be noted that Spearman made an error in his correlation formula on page 77. He defines x and y as medians rather than means. He caught it himself and made the correction in a later article.

It is hard to calculate the impact of Spearman’s article on modern psychology except to say that it has been immense.]

Miscellaneous