notes/Variance-components (Link Bibliography)

“notes/​-components” links:

  1. ⁠, Renaud Rincent, Jean-Paul Charpentier, Patricia Faivre-Rampant, Etienne Paux, Jacques Le Gouis, Catherine Bastien, Vincent Segura (2018-04-16):

    Genomic selection—the prediction of breeding values using DNA polymorphisms—is a disruptive method that has widely been adopted by animal and plant breeders to increase crop, forest and livestock productivity and ultimately secure food and energy supplies. It improves breeding schemes in different ways, depending on the biology of the species and genotyping and phenotyping constraints. However, both and classical phenotypic selection remain difficult to implement because of the high genotyping and phenotyping costs that typically occur when selecting large collections of individuals, particularly in early breeding generations. To specifically address these issues, we propose a new conceptual framework called phenomic selection, which consists of a prediction approach based on low-cost and high-throughput phenotypic descriptors rather than DNA polymorphisms. We applied phenomic selection on two species of economic interest (wheat and poplar) using near-infrared spectroscopy on various tissues. We showed that one could reach accurate predictions in independent environments for developmental and productivity traits and tolerance to disease. We also demonstrated that under realistic scenarios, one could expect much higher genetic gains with phenomic selection than with genomic selection. Our work constitutes a proof of concept and is the first attempt at phenomic selection; it clearly provides new perspectives for the breeding community, as this approach is theoretically applicable to any organism and does not require any genotypic information.

  2. ⁠, Joseph L. Gage, Elliot Richards, Nicholas Lepak, Nicholas Kaczmar, Chinmay Soman, Girish Chowdhary, Michael A. Gore, Edward S. Buckler (2019-09-10):

    Collecting useful, interpretable, and biologically relevant phenotypes in a resource-efficient manner is a bottleneck to plant breeding, genetic mapping, and genomic prediction. Autonomous and affordable sub-canopy rovers are an efficient and scalable way to generate sensor-based datasets of in-field crop plants. Rovers equipped with light detection and ranging (LiDar) can produce three-dimensional reconstructions of entire hybrid maize fields. In this study, we collected 2,103 LiDar scans of hybrid maize field plots and extracted phenotypic data from them by Latent Space Phenotyping (LSP). We performed LSP by two methods, principal component analysis (PCA) and a convolutional autoencoder, to extract meaningful, quantitative Space Phenotypes (LSPs) describing whole-plant architecture and biomass distribution. The LSPs had heritabilities of up to 0.44, similar to some manually measured traits, indicating they can be selected on or genetically mapped. Manually measured traits can be successfully predicted by using LSPs as explanatory variables in partial least squares regression, indicating the LSPs contain biologically relevant information about plant architecture. These techniques can be used to assess crop architecture at a reduced cost and in an automated fashion for breeding, research, or extension purposes, as well as to create or inform crop growth models.

  3. ⁠, Jordan Ubbens, Mikolaj Cieslak, Przemyslaw Prusinkiewicz, Ian Stavness (2019-04-15):

    Association mapping studies have enabled researchers to identify candidate loci for many important environmental resistance factors, including agronomically relevant resistance traits in plants. However, traditional genome-by-environment studies such as these require a phenotyping pipeline which is capable of accurately and consistently measuring stress responses, typically in an automated high-throughput context using image processing. In this work, we present Latent Space Phenotyping (LSP), a novel phenotyping method which is able to automatically detect and quantify response-to-treatment directly from images. Using two synthetically generated image datasets, we first show that LSP is able to successfully recover the simulated in both simple and complex synthetic imagery. We then demonstrate an example application of an interspecific cross of the model C4 grass Setaria. We propose LSP as an alternative to traditional image analysis methods for phenotyping, enabling association mapping studies without the need for engineering complex image processing pipelines.

  4. ⁠, Daniel E. Runcie, Jiayi Qu, Hao Cheng, Lorin Crawford (2021-07-23):

    [Previously: Grid-LMM (Runcie & Crawford 2019).] Large-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the ⁠, a tool notorious for its fragility when applied to more than a handful of traits. We present MegaLMM, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using 3 examples with real plant data, we show that MegaLMM can leverage thousands of traits at once to substantially improve genetic value prediction accuracy.

    …Here, we describe MegaLMM (linear mixed models for millions of observations), a novel statistical method and computational algorithm for fitting massive-scale MvLMMs to large-scale phenotypic datasets. Although we focus on plant breeding applications for concreteness, our method can be broadly applied wherever multi-trait linear mixed models are used (e.g., human genetics, industrial experiments, psychology, linguistics, etc.). MegaLMM dramatically improves upon existing methods that fit low-rank MvLMMs, allowing multiple random effects and un-balanced study designs with large amounts of missing data. We achieve both scalability and statistical robustness by combining strong, but biologically motivated, Bayesian priors for statistical regularization—analogous to the pn approach of genomic prediction methods—with algorithmic innovations recently developed for LMMs. In the 3 examples below, we demonstrate that our algorithm maintains high predictive accuracy for tens-of-thousands of traits, and dramatically improves the prediction of genetic values over existing methods when applied to data from real breeding programs.

    …Together, the set of parallel univariate LMMs and the set of factor loading vectors result in a novel and very general re-parameterization of the MvLMM framework as a mixed-effect factor model. This parameterization leads to dramatic computational performance gains by avoiding all large matrix inversions. It also serves as a scaffold for eliciting Bayesian priors that are intuitive and provide powerful regularization which is necessary for robust performance with limited data. Our default distributions encourage: (1) shrinkage on the factor-trait correlations (λjk) to avoid over-fitting covariances, and (2) shrinkage on the factor sizes to avoid including too many latent traits. This 2-dimensional regularization helps the model focus only on the strongest, most relevant signals in the data.

    Model limitations: While MegaLMM works well across a wide range of applications in breeding programs, our approach does have some limitations.

    First, since MegaLMM is built on the Grid-LMM framework for efficient likelihood calculations22, it does not scale well to large numbers of observations (in contrast to large numbers of traits), or large numbers of random effects. As the number of observational units increases, MegaLMM’s memory requirements increase quadratically because of the requirement to store sets of pre-calculated inverse-variance matrices. Similarly, for each additional random effect term included in the model, memory requirements increase exponentially. Therefore, we generally limit models to fewer than 10,000 observations [n] and only 1-to-4 random effect terms per trait. There may be opportunities to reduce this memory burden if some of the random effects are low-rank; then these random effects could be updated on the fly using efficient routines for low-rank Cholesky updates. We also do not currently suggest including regressions directly on markers and have used marker-based kinship matrices here instead for computational efficiency. Therefore as a stand-alone prediction method, MegaLMM requires calculations involving the Schur complement of the joint kinship matrix of the testing and training individuals which can be computationally costly.

    Second, MegaLMM is inherently a linear model and cannot effectively model trait relationships that are non-linear. Some non-linear relationships between predictor variables (like genotypes) and traits can be modeled through non-linear kernel matrices, as we demonstrated with the RKHS application to the Bread Wheat data. However, allowing non-linear relationships among traits is currently beyond the capacity of our software and modeling approach. Extending our mixed effect model on the low-dimensional latent factor space to a non-linear modeling structure like a neural network may be an exciting area for future research. Also, some sets of traits may not have low-rank correlation structures that are well-approximated by a factor model. For example, certain auto-regressive dependence structures are low-rank but cannot efficiently be decomposed into a discrete set of factors.

    Nevertheless, we believe that in its current form, MegaLMM will be useful to a wide range of researchers in quantitative genetics and plant breeding.

  5. ⁠, Gustavo de los Campos, Torsten Pook, Agustin Gonzalez-Raymundez, Henner Simianer, George Mias, Ana I. Vazquez (2020-02-15):

    Motivation

    Modern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns.

    Results: We propose and evaluate two methods for analysis variance when both input and output sets are high-dimensional. Our approach uses random effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. We used simulations to assess the bias and variance of each of the methods, and to compare it with that of the Partial Least Squares (PLS)–an approach commonly used in multivariate-high-dimensional regressions. The MC-ANOVA method gave nearly unbiased estimates in all the simulation scenarios considered. Estimates produced by Eigen-ANOVA and PLS had noticeable biases. Finally, we demonstrate insight that can be obtained with the of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus in chicken genomes and to the assessment of inter-dependencies between gene expression, methylation and copy-number-variants in data from breast cancer tumors.

    Availability

    The Supplementary data includes an R-implementation of each of the proposed methods as well as the scripts used in simulations and in the real-data analyses.

    Contact

    gustavoc@msu.edu

    Supplementary information

    Supplementary data are available at Bioinformatics online.

  6. ⁠, Andrew Whalen, Chris Gaynor, John M. Hickey (2020-03-02):

    In this paper we develop and test a method which uses high-throughput phenotypes to infer the genotypes of an individual. The inferred genotypes can then be used to perform genomic selection. Previous methods which used high-throughput phenotype data to increase the accuracy of selection assumed that the high-throughput phenotypes correlate with selection targets. When this is not the case, we show that the high-throughput phenotypes can be used to determine which haplotypes an individual inherited from their parents, and thereby infer the individual’s genotypes. We tested this method in two simulations. In the first simulation, we explored, how the accuracy of the inferred genotypes depended on the high-throughput phenotypes used and the genome of the species analysed. In the second simulation we explored whether using this method could increase genetic gain a plant breeding program by enabling genomic selection on non-genotyped individuals. In the first simulation, we found that genotype accuracy was higher if more high-throughput phenotypes were used and if those phenotypes had higher heritability. We also found that genotype accuracy decreased with an increasing size of the species genome. In the second simulation, we found that the inferred genotypes could be used to enable genomic selection on non-genotyped individuals and increase genetic gain compared to random selection, or in some scenarios phenotypic selection. This method presents a novel way for using high-throughput phenotype data in breeding programs. As the quality of high-throughput phenotypes increases and the cost decreases, this method may enable the use of genomic selection on large numbers of non-genotyped individuals.

  7. ⁠, Thomas Battram, Tom R. Gaunt, Doug Speed, Nicholas J. Timpson, Gibran Hemani (2020-10-10):

    Following years of epigenome-wide association studies (EWAS), traits analysed to date tend to yield few associations. Reinforcing this observation, we conducted EWAS on 400 traits and 16 yielded at least one association at the conventional significance threshold (p < 1×10−7). To investigate why EWAS yield is low, we formally estimated the proportion of phenotypic variation captured by 421,693 blood derived DNA methylation markers (h2EWAS) across all 400 traits. The mean h2EWAS was zero, with evidence for regular cigarette smoking exhibiting the largest association with all markers (h2EWAS = 0.42) and the only one surpassing a false discovery rate < 0.1. Though underpowered to determine the h2EWAS value for any one trait, h2EWAS was predictive of the number of EWAS hits across the traits analysed (AUC = 0.7). Modelling the contributions of the methylome on a per-site versus a per-region basis gave varied h2EWAS estimates (r = 0.47) but neither approach obtained substantially higher model fits across all traits. Our analysis indicates that most complex traits do not heavily associate with markers commonly measured in EWAS within blood. However, it is likely DNA methylation does capture variation in some traits and h2EWAS may be a reasonable way to prioritise traits that are likely to yield associations.

  8. ⁠, Daphna Rothschild, Omer Weissbrod, Elad Barkan, Tal Korem, David Zeevi, Paul I. Costea, Anastasia Godneva, Iris Kalka, Noam Bar, Niv Zmora, Meirav Pevsner-Fischer, David Israeli, Noa Kosower, Gal Malka, Bat Chen Wolf, Tali Avnit-Sagi, Maya Lotan-Pompan, Adina Weinberger, Zamir Halpern, Shai Carmi, Eran Elinav, Eran Segal (2017-06-26):

    Human gut composition is shaped by multiple host intrinsic and extrinsic factors, but the relative contribution of host genetic compared to environmental factors remains elusive. Here, we genotyped a cohort of 696 healthy individuals from several distinct ancestral origins and a relatively common environment, and demonstrate that there is no statistically-significant association between microbiome composition and ethnicity, single nucleotide polymorphisms (SNPs), or overall genetic similarity, and that only 5 of 211 (2.4%) previously reported microbiome- associations replicate in our cohort. In contrast, we find similarities in the microbiome composition of genetically unrelated individuals who share a household. We define the term biome-explainability as the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics. Consistent with our finding that microbiome and host genetics are largely independent, we find significant biome-explainability levels of 16–33% for (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption. We further show that several human phenotypes can be predicted substantially more accurately when adding microbiome data to host genetics data, and that the contribution of both data sources to prediction accuracy is largely additive. Overall, our results suggest that human microbiome composition is dominated by environmental factors rather than by host genetics.

  9. ⁠, Mert R. Sabuncu, Tian Ge, Avram J. Holmes, Jordan W. Smoller, Randy L. Buckner, Bruce Fischl, the Alzheimer's Disease Neuroimaging Initiative (2016-09-09):

    Neuroimaging has largely focused on 2 goals: mapping associations between neuroanatomical features and phenotypes and building individual-level prediction models. This paper presents a complementary analytic strategy called morphometricity that aims to measure the neuroanatomical signatures of different phenotypes.

    Inspired by prior work on [genetic] heritability, we define morphometricity as the proportion of phenotypic variation that can be explained by brain morphology (eg., as captured by structural brain MRI). In the dawning era of large-scale datasets comprising traits across a broad phenotypic spectrum, morphometricity will be critical in prioritizing and characterizing behavioral, cognitive, and clinical phenotypes based on their neuroanatomical signatures. Furthermore, the proposed framework will be important in dissecting the functional, morphological, and molecular underpinnings of different traits.

    …Complex physiological and behavioral traits, including neurological and psychiatric disorders, often associate with distributed anatomical variation. This paper introduces a global metric, called morphometricity, as a measure of the anatomical signature of different traits. Morphometricity is defined as the proportion of phenotypic variation that can be explained by macroscopic brain morphology.

    We estimate morphometricity via a linear mixed-effects model that uses an anatomical similarity matrix computed based on measurements derived from structural brain MRI scans. We examined over 3,800 unique MRI scans from 9 large-scale studies to estimate the morphometricity of a range of phenotypes, including clinical diagnoses such as Alzheimer’s disease, and nonclinical traits such as measures of cognition.

    Our results demonstrate that morphometricity can provide novel insights about the neuroanatomical correlates of a diverse set of traits, revealing associations that might not be detectable through traditional statistical techniques.

    [Keywords: neuroimaging, brain morphology, statistical association]

  10. https://www.sciencedirect.com/science/article/pii/S0896627317310929

  11. 2018-bessadok.pdf: “Intact Connectional Morphometricity Learning Using Multi-view Morphological Brain Networks with Application to Autism Spectrum Disorder”⁠, Alaa Bessadok, Islem Rekik

  12. http://matthewtoews.com/papers/IPMI2019_BoF_Manifold_Laurent.pdf

  13. https://www.nature.com/articles/s41467-019-10317-7

  14. ⁠, Baptiste Couvy-Duchesne, Lachlan T. Strike, Futao Zhang, Yan Holtz, Zhili Zheng, Kathryn E. Kemper, Loic Yengo, Olivier Colliot, Margaret J. Wright, Naomi R. Wray, Jian Yang, Peter M. Visscher (2019-07-09):

    The recent availability of large-scale neuroimaging cohorts (here the [UKB] and the Human Connectome Project [HCP]) facilitates deeper characterisation of the relationship between phenotypic and brain architecture variation in humans. We tested the association between 654,386 vertex-wise measures of cortical and subcortical morphology (from T1w and T2w MRI images) and behavioural, cognitive, psychiatric and lifestyle data. We found a statistically-significant association of grey-matter structure with 58 out of 167 UKB phenotypes spanning substance use, blood assay results, education or income level, diet, depression, being a twin as well as cognition domains (UKB discovery sample: n = 9,888). Twenty-three of the 58 associations replicated (UKB replication sample: n = 4,561; HCP, n = 1,110). In addition, differences in body size (height, weight, BMI, waist and hip circumference, body fat percentage) could account for a substantial proportion of the association, providing possible insight into previous MRI studies for psychiatric disorders where case status is associated with body mass index. Using the same linear mixed model, we showed that most of the associated characteristics (e.g. age, sex, body size, diabetes, being a twin, maternal smoking, body size) could be significantly predicted using all the brain measurements in out-of-sample prediction. Finally, we demonstrated other applications of our approach including a Region Of Interest (ROI) analysis that retain the vertex-wise complexity and ranking of the information contained across MRI processing options.

    Highlights: Our linear mixed model approach unifies association and prediction analyses for highly dimensional vertex-wise MRI data

    Grey-matter structure is associated with measures of substance use, blood assay results, education or income level, diet, depression, being a twin as well as cognition domains

    Body size (height, weight, BMI, waist and hip circumference) is an important source of covariation between the phenome and grey-matter structure

    Grey-matter scores quantify grey-matter based risk for the associated traits and allow to study phenotypes not collected

    The most general cortical processing (“fsaverage” mesh with no smoothing) maximises the brain-morphometricity for all UKB phenotypes

  15. 2019-couvyduchesne-figure1-morphometricity.jpg

  16. ⁠, Baptiste Couvy-Duchesne, Lachlan T. Strike, Futao Zhang, Yan Holtz, Zhili Zheng, Kathryn E. Kemper, Loic Yengo, Olivier Colliot, Margaret J. Wright, Naomi R. Wray, Jian Yang, Peter M. Visscher (2020-07-20):

    The recent availability of large-scale neuroimaging cohorts facilitates deeper characterisation of the relationship between phenotypic and brain architecture variation in humans. Here, we investigate the association (previously coined morphometricity) of a phenotype with all 652,283 vertex-wise measures of cortical and subcortical morphology in a large data set from the UK Biobank (UKB; n = 9,497 for discovery, n = 4,323 for replication) and the Human Connectome Project (n = 1,110).

    We used a linear mixed model with the brain measures of individuals fitted as random effects with covariance relationships estimated from the imaging data. We tested 167 behavioural, cognitive, psychiatric or lifestyle phenotypes and found statistically-significant morphometricity for 58 phenotypes (spanning substance use, blood assay results, education or income level, diet, depression, and cognition domains), 23 of which replicated in the UKB replication set or the HCP. We then extended the model for a bivariate analysis to estimate grey-matter correlation between phenotypes, which revealed that body size (ie., height, weight, BMI, waist and hip circumference, body fat percentage) could account for a substantial proportion of the morphometricity (confirmed using a conditional analysis), providing possible insight into previous MRI case-control results for psychiatric disorders where case status is associated with body mass index. Our LMM framework also allowed to predict some of the associated phenotypes from the vertex-wise measures, in two independent samples. Finally, we demonstrated additional new applications of our approach: (a) region of interest (ROI) analysis that retain the vertex-wise complexity; (b) comparison of the information retained by different MRI processings.

  17. 2019-he.pdf: “Predicting human inhibitory control from brain structural MRI⁠, Ningning He, Edmund T. Rolls, Wei Zhao, Shuixia Guo

  18. ⁠, Baptiste Couvy-Duchesne, Johann Faouzi, Benoît Martin, Elina Thibeau-Sutre, Adam Wild, Manon Ansart, Stanley Durrleman, Didier Dormont, Ninon Burgos, Olivier Colliot (2020-12-15):

    We ranked third in the Predictive Analytics Competition (PAC) 2019 challenge by achieving a mean absolute error (MAE) of 3.33 years in predicting age from T1-weighted MRI brain images. Our approach combined seven algorithms that allow generating predictions when the number of features exceeds the number of observations, in particular, two versions of best linear unbiased predictor (BLUP), support vector machine (), two shallow convolutional neural networks (CNNs), and the famous and Inception V1. Ensemble learning was derived from estimating weights via linear regression in a hold-out subset of the training sample. We further evaluated and identified factors that could influence prediction accuracy: choice of algorithm, ensemble learning, and features used as input/​​​​MRI image processing. Our prediction error was correlated with age, and absolute error was greater for older participants, suggesting to increase the training sample for this subgroup. Our results may be used to guide researchers to build age predictors on healthy individuals, which can be used in research and in the clinics as non-specific predictors of disease status.

    [Keywords: brain age, MRI, machine learning, deep learning, statistical learning, ensemble learning]

    Morphometricity of Age as Upper Bound of Prediction Accuracy: From BLUP models, we estimated the total association between age and the brain features. Morphometricity is expressed in proportion of the variance (R2) of age; thus, it quantifies how much of the differences in age in the sample may be attributed/​​​​associated with variation in brain structure. With surface-based processing (~650,000 vertices), we estimated the morphometricity to be R2 = 0.99 (SE = 0.052), while for volume-based processing (~480,000 voxels), it reached R2 = 0.97 (SE = 0.015).

  19. ⁠, Baptiste Couvy-Duchesne, Futao Zhang, Kathryn E. Kemper, Julia Sidorenko, Naomi R. Wray, Peter M. Visscher, Olivier Colliot, Jian Yang (2021-01-22):

    Covariance between grey-matter measurements can reflect structural or functional brain networks though it has also been shown to be influenced by factors (e.g. age, head size, scanner), which could lead to lower mapping precision (increased size of associated clusters) and create distal false positives associations in mass-univariate vertex-wise analyses.

    We evaluated this concern by performing state-of-the-art mass-univariate analyses (general linear model, GLM) on traits simulated from real vertex-wise grey matter data (including cortical and subcortical thickness and surface area). We contrasted the results with those from linear mixed models (LMMs), which have been shown to overcome similar issues in omics association studies.

    We showed that when performed on a large sample (n = 8,662, UK Biobank), GLMs yielded large spatial clusters of statistically-significant vertices and greatly inflated false positive rate (Family Wise Error Rate: FWER = 1, cluster false discovery rate: FDR>0.6). We showed that LMMs resulted in more parsimonious results: smaller clusters and reduced false positive rate (yet FWER>5% after ) but at a cost of increased computation. In practice, the parsimony of LMMs results from controlling for the joint effect of all vertices, which prevents local and distal redundant associations from reaching statistical-significance.

    Next, we performed mass-univariate association analyses on 5 real UKB traits (age, sex, BMI, fluid intelligence and smoking status) and LMM yielded fewer and more localised associations. We identified 19 clusters displaying small associations with age, sex and BMI, which suggest a complex architecture of at least dozens of associated areas with those phenotypes.

  20. https://www.lesswrong.com/posts/6miu9BsKdoAi72nkL/a-contamination-theory-of-the-obesity-epidemic?commentId=9gMgjDQ5xgbxe8aLd

  21. https://osf.io/x4fk3/

  22. Everything

  23. 2021-gotz.pdf: ⁠, Friedrich M. Götz, Samuel D. Gosling, Peter J. Rentfrow (2021-07-02; statistics  /​ ​​ ​bias):

    We draw on genetics research to argue that complex psychological phenomena are most likely determined by a multitude of causes and that any individual cause is likely to have only a small effect.

    Building on this, we highlight the dangers of a publication culture that continues to demand large effects. First, it rewards inflated effects that are unlikely to be real and encourages practices likely to yield such effects. Second, it overlooks the small effects that are most likely to be real, hindering attempts to identify and understand the actual determinants of complex psychological phenomena.

    We then explain the theoretical and practical relevance of small effects, which can have substantial consequences, especially when considered at scale and over time. Finally, we suggest ways in which scholars can harness these insights to advance research and practices in psychology (i.e., leveraging the power of big data, machine learning, and science; promoting rigorous preregistration, including prespecifying the smallest effect size of interest; contextualizing effects; changing cultural norms to reward accurate and meaningful effects rather than exaggerated and unreliable effects).

    Only once small effects are accepted as the norm, rather than the exception, can a reliable and reproducible cumulative psychological science be built.

    [See variance-components for one route forward in quantifying small effects given the daunting statistical power challenges. Götz et al appear locked into the conventional framework of directly estimating effects, when what they really need to borrow from genetics is heritability… You can’t afford to gather n in the millions when you aren’t even sure your haystack contains a needle!]

  24. https://surveyanon.wordpress.com/2019/07/22/playing-around-with-gendermetricity/

  25. 2012-herculanohouzel.pdf: ⁠, Suzana Herculano-Houzel (2012-06-19; psychology):

    [] Neuroscientists have become used to a number of “facts” about the human brain: It has 100 billion neurons and 10- to 50-fold more glial cells; it is the largest-than-expected for its body among primates and mammals in general, and therefore the most cognitively able; it consumes an outstanding 20% of the total body energy budget despite representing only 2% of body mass because of an increased metabolic need of its neurons; and it is endowed with an overdeveloped ⁠, the largest compared with brain size.

    These facts led to the widespread notion that the human brain is literally extraordinary: an outlier among mammalian brains, defying evolutionary rules that apply to other species, with a uniqueness seemingly necessary to justify the superior cognitive abilities of humans over mammals with even larger brains. These facts, with deep implications for neurophysiology and evolutionary biology, are not grounded on solid evidence or sound assumptions, however.

    Our recent development of a method that allows rapid and reliable quantification of the numbers of cells that compose the whole brain has provided a means to verify these facts. Here, I review this recent evidence and argue that, with 86 billion neurons and just as many nonneuronal cells, the human brain is a scaled-up primate brain in its cellular composition and metabolic cost, with a relatively enlarged cerebral cortex that does not have a relatively larger number of brain neurons yet is remarkable in its cognitive abilities and metabolism simply because of its extremely large number of neurons.