2015-polderman.pdf: “Meta-analysis of the heritability of human traits based on fifty years of twin studies”, (2015-05-18; ):
Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All the results can be visualized using the MaTCH webtool.
1939-pearson.pdf: “"Student" as Statistician”, (1939-01-00; ):
[Egon Pearson describes Student, or Gosset, as a statistician: Student corresponded widely with young statisticians/mathematicians, encouraging them, and having an outsized influence not reflected in his publication. Student’s preferred statistical tools were remarkably simple, focused on correlations and standard deviations, but wielded effectively in the analysis and efficient design of experiments (particularly agricultural experiments), and he was an early decision-theorist, focused on practical problems connected to his Guinness Brewery job—which detachment from academia partially explains why he didn’t publish methods or results immediately or often. The need to handle small n of the brewery led to his work on small-sample approximations rather than, like Pearson et al in the Galton biometric tradition, relying on collecting large datasets and using asymptotic methods, and Student carried out one of the first Monte Carlo simulations.]
2008-ziliak.pdf: “Retrospectives Guinnessometrics: The Economic Foundation of “Student’s” t”, (2008-09; ):
In economics and other sciences, “statistical-significance” is by custom, habit, and education a necessary and sufficient condition for proving an empirical result (Ziliak and McCloskey, 2008; McCloskey & Ziliak, 1996). The canonical routine is to calculate what’s called a t-statistic and then to compare its estimated value against a theoretically expected value of it, which is found in “Student’s” t table. A result yielding a t-value greater than or equal to about 2.0 is said to be “statistically-significant at the 95 percent level.” Alternatively, a regression coefficient is said to be “statistically-significantly different from the null, p < 0.05.” Canonically speaking, if a coefficient clears the 95 percent hurdle, it warrants additional scientific attention. If not, not. The first presentation of “Student’s” test of statistical-significance came a century ago, in “The Probable Error of a Mean” (1908b), published by an anonymous “Student.” The author’s commercial employer required that his identity be shielded from competitors, but we have known for some decades that the article was written by William Sealy Gosset (1876–1937), whose entire career was spent at Guinness’s brewery in Dublin, where Gosset was a master brewer and experimental scientist (E. S. Pearson, 1937). Perhaps surprisingly, the ingenious “Student” did not give a hoot for a single finding of “statistical”-significance, even at the 95 percent level of statistical-significance as established by his own tables. Beginning in 1904, “Student”, who was a businessman besides a scientist, took an economic approach to the logic of uncertainty, arguing finally that statistical-significance is “nearly valueless” in itself.
1923-student.pdf: “On Testing Varieties of Cereals”, Student (William Sealy Gosset)
1920-thorndike.pdf: “Intelligence and Its Uses”, Edward L. Thorndike
We propose a summary statistic for the economic well-being of people in a country. Our measure incorporates consumption, leisure, mortality, and inequality, first for a narrow set of countries using detailed micro data, and then more broadly using multi-country datasets. While welfare is highly correlated with GDP per capita, deviations are often large. Western Europe looks considerably closer to the United States, emerging Asia has not caught up as much, and many developing countries are further behind. Each component we introduce plays an important role in accounting for these differences, with mortality being most important.
Key Point 1: GDP per person is an excellent indicator of welfare across the broad range of countries: the two measures have a correlation of 0.98. Nevertheless, for any given country, the difference between the two measures can be important. Across 13 countries, the median deviation is about 35%.
Figure 5 illustrates this first point. The top panel plots the welfare measure, λ, against GDP per person. What emerges prominently is that the two measures are highly correlated, with a correlation coefficient (for the logs) of 0.98. Thus per capita GDP is a good proxy for welfare under our assumptions. At the same time, there are clear departures from the 45° line. In particular, many countries with very low GDP per capita exhibit even lower welfare. As a result, welfare is more dispersed (standard deviation of 1.51 in logs) than is income (standard deviation of 1.27 in logs).
The bottom panel provides a closer look at the deviations. This figure plots the ratio of welfare to per capita GDP across countries. The European countries have welfare measures 22% higher than their incomes. The remaining countries, in contrast, have welfare levels that are typically 25–50% below their incomes. The way to reconcile these large deviations with the high correlation between welfare and income is that the “scales” are so different. Incomes vary by more than a factor of 64 in our sample, ie., 6,300%, whereas the deviations are on the order of 25–50%.
Eight out of ten leading international indices to assess developing countries in aspects beyond GDP are showing strong redundancy, bias and unilateralism. The quantitative comparison gives evidence for the fact that always the same countries lead the ranks with a low standard deviation. The dependency of the GDP is striking: do the indices only measure indicators that are direct effects of a strong GDP? While the impact of GDP can be discussed reverse as well, the standard deviation shows a strong bias: only one out of the twenty countries with the highest standard deviation is among the Top-20 countries of the world, but 11 countries among those with the lowest standard deviation. Let’s have a look at the backsides of global statistics and methods to compare their findings. The article is the result of a pre-study to assess Social Capital for development countries made for the German Federal Ministry for Economic Cooperation and Development. The study led to the UN Sustainable Development Goals (UN SDG) project World Social Capital Monitor.
1950-good-probabilityandtheweighingofevidence.pdf#page=96: “Probability and the Weighing of Evidence”, I. J. Good
1954-tukey.pdf: “Unsolved Problems of Experimental Statistics”, John W. Tukey
1951-yates.pdf: “The Influence of 'Statistical Methods for Research Workers' on the Development of the Science of Statistics”, Francis Yates
1959-lehmann-testingstatisticalhypotheses.pdf: “Testing Statistical Hypotheses (First Edition)”, E. L. Lehmann
1968-nichols.pdf: “Heredity, Environment, and School Achievement”, Robert C. Nichols
1975-oakes.pdf: “On the alleged falsity of the null hypothesis”, William F. Oakes
1972-page.pdf: “How We *All* Failed In Performance Contracting”, Ellis B. Page
1976-loehlin-heredityenvironmentandpersonality.pdf: “Heredity, Environment, & Personality: A Study of 850 Sets of Twins”, (1976; ):
This volume reports on a study of 850 pairs of twins who were tested to determine the influence of heredity and environment on individual differences in personality, ability, and interests. It presents the background, research design, and procedures of the study, a complete tabulation of the test results, and the authors’ extensive analysis of their findings. Based on one of the largest studies of twin behavior ever conducted, the book challenges a number of traditional beliefs about genetic and environmental contributions to personality development.
The subjects were chosen from participants in the National Merit Scholarship Qualifying Test of 1962 and were mailed a battery of personality and interest questionnaires. In addition, parents of the twins were sent questionnaires asking about the twins’ early experiences. A similar sample of nontwin students who had taken the merit exam provided a comparison group. The questions investigated included how twins are similar to or different from non-twins, how identical twins are similar to or different from fraternal twins, how the personalities and interests of twins reflect genetic factors, how the personalities and interests of twins reflect early environmental factors, and what implications these questions have for the general issue of how heredity and environment influence the development of psychological characteristics. In attempting to answer these questions, the authors shed new light on the importance of both genes and environment and have formed the basis for new approaches in behavior genetic research.
In this paper, I wish to examine a dogma of inferential procedure which, for psychologists at least, has attained the status of a religious conviction. The dogma to be scrutinized is the “null-hypothesistest” orthodoxy that passing statistical judgment on a scientific hypothesis by means of experimental observation is a decision procedure wherein one rejects or accepts a null hypothesis according to whether or not the value of a sample statistic yielded by an experiment falls within a certain predetermined “rejection region” of its possible values. The thesis to be advanced is that despite the awesome preeminence this method has attained in our experimental journals and textbooks of applied statistics, it is based upon a fundamental misunderstanding of the nature of rational inference, and is seldom if ever appropriate to the aims of scientific research.
1973-keuth.pdf: “On Prior Probabilities of Rejecting Statistical Hypotheses”, Herbert Keuth
1975-swoyer.pdf: “Theory Confirmation in Psychology”, Chris Swoyer, Thomas C. Monson
1982-loftus-essenceofstatistics.pdf: “Essence of Statistics (Second Edition)”, Geoffry R. Loftus, Elizabeth F. Loftus
1986-gottfredson.pdf: “The _g_ factor in employment”, Linda S. Gottfredson
1986-avery.pdf: “Origins of and Reactions to the PTC conference on _The g Factor In Employment Testing_”, Lillian Markos Avery
1986-jensen.pdf: “g: Artifact or Reality?”, Arthur R. Jensen
1986-thorndike.pdf: “The role of general ability in prediction”, Robert L. Thorndike
1986-hunter.pdf: “Cognitive ability, cognitive aptitudes, job knowledge, and job performance”, John E. Hunter
1986-gottfredson-2.pdf: “Validity versus utility of mental tests: Example of the SAT”, Linda S. Gottfredson, James Crouse
1986-gottfredson-3.pdf: “Societal consequences of the g factor in employment”, Linda S. Gottfredson
1986-hawk.pdf: “Real world implications of g”, John Hawk
1986-arvey.pdf: “General ability in employment: A discussion”, Richard D. Arvey
1986-humphreys.pdf: “Commentary [on 'The _g_ factor in employment special issue']”, Lloyd G. Humphreys
1986-linn.pdf: “Comments on the g factor in employment testing”, Robert L. Linn
1986-tyler.pdf: “Back to Spearman?”, Leona E. Tyler
1941-thurstone-factorialstudiesofintelligence.pdf: “Factorial Studies Of Intelligence”, Louis L. Thurstone, Thelma G. Thurstone
1941-evans.pdf: “A New Measure of Introversion–Extroversion”, (1941; ):
This paper describes the development of relatively independent measures for 3 types of Introversion-Extroversion: Thinking. Social, and Emotional. The need for clarifying the concept of I-E and for devising new inventories can best be understood by reviewing the confusion concerning its nature and measurement. In the effort to simplify the original complex description of I-E by Jung, psychologists either have introduced new concepts or emphasized varying phases of Jung’s definition. In this process of elaboration, they have actually complicated rather than clarified the idea of I-E. The use of these terms in the popular literature has only added to the confusion. Unfortunately, introversion, at least in the popular writings on psychology, has come to denote an undesirable personality tendency which borders on a neurotic condition.
In general, the available I-E inventories purport to measure a general, undifferentiated trait. However, the intercorrelations between the published inventories are surprisingly low. Only 5 of the 19 coefficients of intercorrelation reported in the literature for nine inventories are above 0.40, and only 2 are above 0.80. The 2 coefficients above 0.80 are between 2 inventories and revised forms of these same inventories.
…This study has reduced the confusion in the field of measurement of I-E by getting away from the general undifferentiated concept of I-E. An inventory was constructed to measure, not a general trait, but 3 types or phases of I-E which were clearly defined. By a simple technique of item analysis, 3 homogeneous and relatively independent I-E tests were developed. Each test seems to be sufficiently reliable for individual prediction. The demonstrated ability of each test to discriminate between groups of college students which one would logically expect to be characteristically different in a given type of I-E justifies the conclusion that each test is sufficiently valid for the inventory to be employed in the diagnosis and counseling of college students.
1992-thompson.pdf: “Two and One-Half Decades of Leadership in Measurement and Evaluation”, Bruce Thompson
1997-muzaik.pdf: “There Is a Time and a Place for Significance Testing”, Stanley A. Mulaik, Nambury S. Raju, Richard A. Harshman
1961-ames.pdf: “Distributions of Correlation Coefficients in Economic Time Series”, (1961; ):
This paper presents results, mainly in tabular form, of a sampling experiment in which 100 economic time series 25 years long were drawn at random from the Historical Statistics for the United States. Sampling distributions of coefficients of correlation and autocorrelation were computed using these series, and their logarithms, with and without correction for linear trend.
We find that the frequency distribution of autocorrelation coefficients has the following properties:
- It is roughly invariant under logarithmic transformation of data.
- It is approximated by a Pearson Type XII function.
- It approaches a rectangular distribution symmetric about 0 as the lag increases.
The autocorrelation properties observed are not to be explained by linear trends alone. Correlations and lagged cross-correlations are quite high for all classes of data. eg., given a randomly selected series, it is possible to find, by random drawing, another series which explains at least 50% of the variances of the first one, in from 2 to 6 random trials, depending on the class of data involved. The sampling distributions obtained provide a basis for tests of statistical-significance of correlations of economic time series. We also find that our economic series are well described by exact linear difference equations of low order.
In conventional epidemiology confounding of the exposure of interest with lifestyle or socioeconomic factors, and reverse causation whereby disease status influences exposure rather than vice versa, may invalidate causal interpretations of observed associations. Conversely, genetic variants should not be related to the factors that distort associations in conventional observational epidemiological studies. Furthermore, disease onset will not influence genotype. Therefore, it has been suggested that genetic variants that are known to be associated with a modifiable (nongenetic) risk factor can be used to help determine the causal effect of this modifiable risk factor on disease outcomes. This approach, mendelian randomization, is increasingly being applied within epidemiological studies. However, there is debate about the underlying premise that associations between genotypes and disease outcomes are not confounded by other risk factors. We examined the extent to which genetic variants, on the one hand, and nongenetic environmental exposures or phenotypic characteristics on the other, tend to be associated with each other, to assess the degree of confounding that would exist in conventional epidemiological studies compared with mendelian randomization studies.
Methods and Findings:
We estimated pairwise correlations between nongenetic baseline variables and genetic variables in a cross-sectional study comparing the number of correlations that were statistically-significant at the 5%, 1%, and 0.01% level (α = 0.05, 0.01, and 0.0001, respectively) with the number expected by chance if all variables were in fact uncorrelated, using a two-sided binomial exact test. We demonstrate that behavioural, socioeconomic, and physiological factors are strongly interrelated, with 45% of all possible pairwise associations between 96 nongenetic characteristics (n = 4,560 correlations) beingat the p < 0.01 level (the ratio of observed to expected statistically-significant associations was 45; p-value for difference between observed and expected < 0.000001). Similar findings were observed for other levels of significance. In contrast, genetic variants showed no greater association with each other, or with the 96 behavioural, socioeconomic, and physiological factors, than would be expected by chance.
These data illustrate why observational studies have produced misleading claims regarding potentially causal factors for disease. The findings demonstrate the potential power of a methodology that utilizes genetic variants as indicators of exposure level when studying environmentally modifiable risk factors.
In a cross-sectional study Davey Smith and colleagues show why observational studies can produce misleading claims regarding potential causal factors for disease, and illustrate the use of mendelian randomization to study environmentally modifiable risk factors.
Epidemiology is the study of the distribution and causes of human disease. Observational epidemiological studies investigate whether particular modifiable factors (for example, smoking or eating healthily) are associated with the risk of a particular disease. The link between smoking and lung cancer was discovered in this way. Once the modifiable factors associated with a disease are established as causal factors, individuals can reduce their risk of developing that disease by avoiding causative factors or by increasing their exposure to protective factors. Unfortunately, modifiable factors that are associated with risk of a disease in observational studies sometimes turn out not to cause or prevent disease. For example, higher intake of vitamins C and E apparently protected people against heart problems in observational studies, but taking these vitamins did not show any protection against heart disease in randomized controlled trials (studies in which identical groups of patients are randomly assigned various interventions and then their health monitored). One explanation for this type of discrepancy is known as confounding—the distortion of the effect of one factor by the presence of another that is associated both with the exposure under study and with the disease outcome. So in this example, people who took vitamin supplements might have also have exercised more than people who did not take supplements and it could have been the exercise rather than the supplements that was protective against heart disease.
Why Was This Study Done?:
It isn’t always possible to check the results of observational studies in randomized controlled trials so epidemiologists have developed other ways to minimize . One approach is known as mendelian randomization. Several gene variants have been identified that affect risk factors. For example, variants in a gene called APOE affect the level of cholesterol in an individual’s blood, a risk factor for heart disease. People inherit gene variants randomly from their parents to build up their own unique genotype (total genetic makeup). Consequently, a study that examines the associations between a gene variant and a disease can indicate whether the risk factor affected by that gene variant causes the disease. There should be no confounding in this type of study, the argument goes, because different genetic variants should not be associated with each other or with nongenetic variables that typically confound directly assessed associations between risk factors and disease. But is this true? In this study, the researchers have tested whether nongenetic risk factors are confounded by each other and also whether genetic variants are confounded by nongenetic risk factors and also by other genetic variants.
What Did the Researchers Do and Find?:
Using data collected in the British Women’s Heart and Health Study, the researchers calculated how many pairs of nongenetic variables (for example, frequency of eating meat, alcohol intake) were statistically-significantly correlated with each other. That is, the number of pairs of nongenetic variables in which a high correlation between both variables occurred in more study participants than expected by chance. They compared this number with the number of correlations that would occur by chance if all the variables were totally independent. When the researchers assumed that 1 in 100 combinations of pairs of variables would have been correlated by chance, the ratio of observed to expected statistically-significant correlations was seen 45 times more frequently than would be expected by chance. When the researchers repeated this exercise with genetic variants, the ratio of observed to expectedcorrelations was 1.58, a figure not significantly different from 1. Similarly, the ratio of observed to expected statistically-significant correlations when pairwise combinations between genetic and nongenetic variants were considered was 1.22.
What Do These Findings Mean?:
These findings have two main implications. First, the large excess of observed over expected associations among the nongenetic variables indicates that many nongenetic modifiable factors occur in clusters—for example, people with healthy diets often have other healthy habits. Researchers doing observational studies always try to adjust forbut this result suggests that this adjustment will be hard to do, in part because it will not always be clear which factors are confounders. Second, the lack of a large excess of observed over expected associations among the genetic variables (and also among genetic variables paired with nongenetic variables) indicates that little is likely to occur in studies that use mendelian randomization. In other words, this approach is a valid way to identify which environmentally modifiable risk factors cause human disease.
Please access these Web sites via the online version of this summary at http: / / dx.doi.org / 10.1371 / journal.pmed.0040352.
1992-phillips.pdf: “Bias in relative odds estimation owing to imprecise measurement of correlated exposures”, Andrew N. Phillips, George Davey Smith
1991-phillips.pdf: “How independent are 'independent' effects? Relative risk estimation when correlated exposures are measured imprecisely”, Andrew N. Phillips, George Davey Smith
1992-smith.pdf: “Smoking as 'independent' risk factor for suicide: illustration of an artifact from observational epidemiology?”, George Davey Smith, Andrew N. Phillips, James D. Neaton
2014-shen.pdf: “When Correcting for Unreliability of Job Performance Ratings, the Best Estimate Is Still 0.52”, (2014-12-01; ):
In this commentary we answer 3 questions that are often posed when debating the usefulness and accuracy of correcting criterion-related validity coefficients for unreliability: (a) Is 0.52 an inaccurate estimate? (b) Do corrections for criterion unreliability lead us to choose different selection tools? (c) Is too much variance explained?
[1. Yes; 2. No, because rank-order of tools’ utility is preserved by the corrections; 3. No, because while everything is correlated r = 0.30 on average, most of those variables are unknowable at hiring time and also adding up variables ignores diminishing returns/intercorrelations between the predictors, so one will never predict perfectly.]
Conclusion: Based on our review of the evidence, the 0.52 estimate of the interrater reliability of supervisor ratings of job performance is an appropriate estimate; corrections for unreliability do not appear to change our decisions regarding the choice of one selection tool over another; and most variables may be more strongly correlated than people expect, making it difficult to demonstrate continued incremental validity in predicting job performance when adding additional predictors. We agree with LeBreton et al. that psychologists need to be careful when applying and interpreting corrections, and we are thankful that they sponsored a discussion on the topic. Corrections are critical for both basic science (ie., estimating population parameters) and practice (ie., recognizing artifacts attenuating estimates on which our work may be evaluated by stakeholders, courts, and other third parties). Ultimately, the appropriate use of corrections depends on the purpose of the project. If the goal is to explain variation among a sample of incumbents on observed criterion scores, then no corrections need to be made. If the goal is to explain variation among incumbents on a true score for job performance, then a correction for unreliability is not only desirable but necessary. Finally, if the goal is to estimate how much variation among applicants is explained by a predictor for a true score on job performance, then corrections for range restriction and unreliability are indispensable. This goal represents the target validity inference that was included in Binning and Barrett’s (1989) figure, but (rather interestingly) is omitted from LeBreton et al.’s reproduction of that figure. We believe that the target validity inference is the most important inference in personnel selection; it provides the critical link from the observed predictor to the criterion construct (see also Putka & Sackett, 2010).
2019-gordon.pdf: “A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook”, (2019-05-04; ):
Measuring the causal effects of digital advertising remains challenging despite the availability of granular data. Unobservable factors make exposure endogenous, and advertising’s effect on outcomes tends to be small. In principle, these concerns could be addressed using randomized controlled trials (). In practice, few online ad campaigns rely on and instead use observational methods to estimate ad effects. We assess empirically whether the variation in data typically available in the advertising industry enables observational methods to recover the causal effects of online advertising. Using data from 15 U.S. advertising experiments at Facebook comprising 500 million user-experiment observations and 1.6 billion ad impressions, we contrast the experimental results to those obtained from multiple observational models. The observational methods often fail to produce the same effects as the randomized experiments, even after conditioning on extensive demographic and behavioral variables. In our setting, advances in causal inference methods do not allow us to isolate the exogenous variation needed to estimate the treatment effects. We also characterize the incremental explanatory power our data would require to enable observational methods to successfully measure advertising effects. Our findings suggest that commonly used observational approaches based on the data usually available in the industry often fail to accurately measure the true effect of advertising.
We discuss methods of data collection and analysis that emphasize the power of individual personality items for predicting real world criteria (eg., smoking, exercise, self-rated health). These methods are borrowed by analogy from radio astronomy and human genomics. Synthetic Aperture Personality Assessment (SAPA) applies a matrix sampling procedure that synthesizes very large covariance matrices through the application of massively missing at random data collection. These large covariance matrices can be applied, in turn, in Persome Wide Association Studies (PWAS) to form personality prediction scores for particular criteria. We use two open source data sets (n = 4,000 and 126,884 with 135 and 696 items respectively) for demonstrations of both of these procedures. We compare these procedures to the more traditional use of “Big 5” or a larger set of narrower factors (the “little 27”). We argue that there is more information at the item level than is used when aggregating items to form factorially derived scales.
[Keywords: Persome, Persome Wide Association Studies, Synthetic Aperture Personality Assessment (SAPA), Massively Missing Completely at Random (MMCAR), Scale construction, Factor analysis, Item analysis]
2006-ozer.pdf: “Personality and the Prediction of Consequential Outcomes”, (2006-02-01; ):
Personality has consequences. Measures of personality have contemporaneous and predictive relations to a variety of important outcomes. Using the Big Five factors as heuristics for organizing the research literature, numerous consequential relations are identified. Personality dispositions are associated with happiness, physical and psychological health, spirituality, and identity at an individual level; associated with the quality of relationships with peers, family, and romantic others at an interpersonal level; and associated with occupational choice, satisfaction, and performance, as well as community involvement, criminal activity, and political ideology at a social institutional level.
[Keywords: individual differences, traits, life outcomes, consequences]
Background: A wide range of diseases show some degree of clustering in families; family history is therefore an important aspect for clinicians when making risk predictions. Familial aggregation is often quantified in terms of a familial relative risk (FRR), and although at first glance this measure may seem simple and intuitive as an average risk prediction, its implications are not straightforward.
Methods: We use two statistical models for the distribution of disease risk in a population: a dichotomous risk model that gives an intuitive understanding of the implication of a given FRR, and a continuous risk model that facilitates a more detailed computation of the inequalities in disease risk. Published estimates of FRRs are used to produce Lorenz curves and Gini indices that quantifies the inequalities in risk for a range of diseases.
Results: We demonstrate that even a moderate familial association in disease risk implies a very large difference in risk between individuals in the population. We give examples of diseases for which this is likely to be true, and we further demonstrate the relationship between the point estimates of FRRs and the distribution of risk in the population.
Conclusions: The variation in risk for several severe diseases may be larger than the variation in income in many countries. The implications of familial risk estimates should be recognized by epidemiologists and clinicians.
Personality researchers have recently advocated the use of very short personality inventories in order to minimize administration time. However, few such inventories are currently available. Here I introduce an automated method that can be used to abbreviate virtually any personality inventory with minimal effort. After validating the method against existing measures in Studies 1 and 2, a new 181-item inventory is generated in Study 3 that accurately recaptures scores on 8 different broadband inventories comprising 203 distinct scales. Collectively, the results validate a powerful new way to improve the efficiency of personality measurement in research settings.
Understanding the nature and extent of horizontal pleiotropy, where one genetic variant has independent effects on multiple observable traits, is vitally important for our understanding of the genetic architecture of human phenotypes, as well as the design of genome-wide association studies (GWASs) and Mendelian randomization (MR) studies. Many recent studies have pointed to the existence of horizontal pleiotropy among human phenotypes, but the exact extent remains unknown, largely due to difficulty in disentangling the inherently correlated nature of observable traits. Here, we present a statistical framework to isolate and quantify horizontal pleiotropy in human genetic variation using a two-component pleiotropy score computed from summary statistic data derived from published GWASs. This score uses a statistical whitening procedure to remove correlations between observable traits and normalize effect sizes across all traits, and is able to detect horizontal pleiotropy under a range of different models in our simulations. When applied to real human phenotype data using association statistics for 1,564 traits measured in 337,119 individuals from the UK Biobank, our score detects a statistically-significant excess of horizontal pleiotropy. This signal of horizontal pleiotropy is pervasive throughout the human genome and across a wide range of phenotypes and biological functions, but is especially prominent in regions of high linkage disequilibrium and among phenotypes known to be highly polygenic and heterogeneous. Using our pleiotropy score, we identify thousands of loci with extreme levels of horizontal pleiotropy, a majority of which have never been previously reported in any published . This highlights an under-recognized class of genetic variation that has weak effects on many distinct phenotypes but no specific marked effect on any one phenotype. We show that a large fraction of these loci replicate using independent datasets of summary statistics. Our results highlight the central role horizontal pleiotropy plays in the genetic architecture of human phenotypes, and the importance of modeling horizontal pleiotropy in genomic medicine.
Accurate estimation of genetic correlation requires large sample sizes and access to genetically informative data, which are not always available. Accordingly, phenotypic correlations are often assumed to reflect genotypic correlations in evolutionary biology. Cheverud’s conjecture asserts that the use of phenotypic correlations as proxies for is appropriate. Empirical evidence of the conjecture has been found across plant and animal species, with results suggesting that there is indeed a robust relationship between the two. Here, we investigate the conjecture in human populations, an analysis made possible by recent developments in availability of human genomic data and computing resources. A sample of 108,035 British European individuals from the was split equally into discovery and replication datasets. 17 traits were selected based on sample size, distribution and heritability. were calculated using linkage disequilibrium score regression applied to the genome-wide association summary statistics of pairs of traits, and compared within and across datasets. Strong and statistically-significant correlations were found for the between-dataset comparison, suggesting that the genetic correlations from one independent sample were able to predict the phenotypic correlations from another independent sample within the same population. Designating the selected traits as morphological or non-morphological indicated little difference in correlation. The results of this study support the existence of a relationship between genetic and phenotypic correlations in humans. This finding is of specific interest in anthropological studies, which use measured phenotypic correlations to make inferences about the genetics of ancient human populations.
Causes of the well-documented association between low levels of cognitive functioning and many adverse neuropsychiatric outcomes, poorer physical health and earlier death remain unknown. We used linkage disequilibrium regression and polygenic profile scoring to test for shared genetic aetiology between cognitive functions and neuropsychiatric disorders and physical health. Using information provided by many published genome-wide association study consortia, we created polygenic profile scores for 24 vascular-metabolic, neuropsychiatric, physiological-anthropometric and cognitive traits in the participants of, a very large population-based sample (n = 112 151). Pleiotropy between cognitive and health traits was quantified by deriving using summary genome-wide association study statistics and to the method of linkage disequilibrium score regression. Substantial and statistically-significant were observed between cognitive test scores in the sample and many of the mental and physical health-related traits and disorders assessed here. In addition, highly statistically-significant associations were observed between the cognitive test scores in the sample and many polygenic profile scores, including coronary artery disease, stroke, Alzheimer’s disease, schizophrenia, autism, major depressive disorder, body mass index, intracranial volume, infant head circumference and childhood cognitive ability. Where disease diagnosis was available for participants, we were able to show that these results were not confounded by those who had the relevant disease. These findings indicate that a substantial level of pleiotropy exists between cognitive abilities and many human mental and physical health disorders and traits and that it can be used to predict phenotypic across samples.
Individuals with lower socio-economic status ( ) are at increased risk of physical and mental illnesses and tend to die at an earlier age. Explanations for the association between SES and health typically focus on factors that are environmental in origin. However, common single nucleotide polymorphisms (SNPs) have been found collectively to explain around 18% (SE = 5%) of the phenotypic of an area-based social deprivation measure of . Molecular genetic studies have also shown that physical and psychiatric diseases are at least partly heritable. It is possible, therefore, that phenotypic associations between SES and health arise partly due to a shared genetic etiology.
We conducted a We find that common SNPs explain 21% (SE = 0.5%) of the variation in social deprivation and 11% (SE = 0.7%) in household income. 2 independent SNPs attained genome-wide for household income, rs187848990 on chromosome 2, and rs8100891 on chromosome 19. Genes in the regions of these SNPs have been associated with intellectual disabilities, schizophrenia, and synaptic plasticity. Extensive were found between both measures of and illnesses, anthropometric variables, psychiatric disorders, and cognitive ability.( ) on social deprivation and on household income using the 112,151 participants of .
These findings show that some SNPs associated with are involved in the brain and central nervous system. The genetic associations with are probably mediated via other partly-heritable variables, including cognitive ability, education, personality, and health.
Background: There is now convincing evidence that pleiotropy across the genome contributes to the correlation between human traits and comorbidity of diseases. The recent availability of genome-wide association study (PRS) approach a powerful way to perform genetic prediction and identify genetic overlap among phenotypes.) results have made the polygenic risk score (
Methods and findings
Here we use the (NFBC1966). We replicate numerous recent findings, such as a genetic association between Alzheimer’s disease and lipid levels, while the depth of phenotyping in the NFBC1966 highlights a range of novel significant genetic associations between traits.method to assess evidence for shared genetic aetiology across hundreds of traits within a single epidemiological study—the Northern Finland Birth Cohort 1966
Conclusion: This study illustrates the power in taking a hypothesis-free approach to the study of shared genetic aetiology between human traits and diseases. It also demonstrates the potential of themethod to provide important biological insights using only a single well-phenotyped epidemiological study of moderate sample size (~5k), with important advantages over evaluating from summary statistics only.
Background: Identifying genetic relationships between complex traits in emerging adulthood can provide useful etiological insights into risk for psychopathology. College-age individuals are under-represented in genomic analyses thus far, and the majority of work has focused on the clinical disorder or cognitive abilities rather than normal-range behavioral outcomes.
Methods: This study examined a sample of emerging adults 18–22 years of age (n = 5947) to construct an atlas of polygenic risk for 33 traits predicting relevant phenotypic outcomes. 28 hypotheses were tested based on the previous literature on samples of European ancestry, and the availability of rich assessment data allowed for polygenic predictions across 55 psychological and medical phenotypes.
Results: Polygenic risk for nicotine use, trauma, and family history of psychological disorders. Polygenic risk for neuroticism predicted anxiety, depression, phobia, panic, neuroticism, and was correlated with polygenic risk for cardiovascular disease.(SZ) in emerging adults predicted anxiety, depression,
Conclusions: These results demonstrate the extensive impact of genetic risk for SZ, neuroticism, and major depression on a range of health outcomes in early adulthood. Minimal cross-ancestry replication of these phenomic patterns of polygenic influence underscores the need for more genome-wide association studies of non-European populations.
Genomic analysis of longevity offers the potential to illuminate the biology of human aging. Here, using genome-wide association meta-analysis of 606,059 parents’ survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA). We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity. Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated. We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD. Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.
Intelligence, or general cognitive function, is phenotypically and studies (MTAG; Turley et al. 2017)—to combine two large genome-wide association studies (GWASs) of education and intelligence, increasing statistical power and resulting in the largest of intelligence yet reported. Our study had four goals: first, to facilitate the discovery of new genetic loci associated with intelligence; second, to add to our understanding of the biology of intelligence differences; third, to examine whether combining traits in this way produces results consistent with the primary phenotype of intelligence; and, finally, to test how well this new meta-analytic data sample on intelligence predicts phenotypic intelligence in an independent sample. By combining datasets using MTAG, our functional sample size increased from 199,242 participants to 248,482. We found 187 independent loci associated with intelligence, implicating 538 genes, using both SNP-based and gene-based . We found evidence that neurogenesis and myelination—as well as genes expressed in the synapse, and those involved in the regulation of the nervous system—may explain some of the biological differences in intelligence. The results of our combined analysis demonstrated the same pattern of genetic correlations as those from previous GWASs of intelligence, providing support for the meta-analysis of these genetically-related phenotypes.with many traits, including a wide range of physical, and mental health variables. Education is strongly with intelligence (rg = 0.70). We used these findings as foundations for our use of a novel approach—multi-trait analysis of genome-wide association
After a decade of (GWASs), fundamental questions in human genetics are still unanswered, such as the extent of pleiotropy across the genome, the nature of trait-associated genetic variants and the disparate genetic architecture across human traits. The current availability of hundreds of results provide the unique opportunity to gain insight into these questions. In this study, we harmonized and systematically analysed 4,155 publicly available GWASs. For a subset of well-powered on 558 unique traits, we provide an extensive overview of pleiotropy and genetic architecture. We show that trait associated loci cover more than half of the genome, and 90% of those loci are associated with multiple trait domains. We further show that potential causal genetic variants are enriched in coding and flanking regions, as well as in regulatory elements, and how trait-polygenicity is related to an estimate of the required sample size to detect 90% of causal genetic variants. Our results provide novel insights into how genetic variation contributes to trait variation. All GWAS results can be queried and visualized at the ATLAS resource (http://atlas.ctglab.nl).
“Genetic Consequences of Social Stratification in Great Britain”, (2018-10-30):
Human DNA varies across geographic regions, with most variation observed so far reflecting distant ancestry differences. Here, we investigate the geographic clustering of genetic variants that influence complex traits and disease risk in a sample of ~450,000 individuals from Great Britain. Out of 30 traits analyzed, 16 show significant geographic clustering at the genetic level after controlling for ancestry, likely reflecting recent migration driven by ( ). Alleles associated with educational attainment (EA) show most clustering, with EA-decreasing alleles clustering in lower areas such as coal mining areas. Individuals that leave coal mining areas carry more EA-increasing alleles on average than the rest of Great Britain. In addition, we leveraged the geographic clustering of complex trait variation to further disentangle regional differences in socio-economic and cultural outcomes through genome-wide association studies on publicly available regional measures, namely coal mining, religiousness, 1970/2015 general election outcomes, and Brexit referendum results.
2019-liu.pdf: “Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use”, Mengzhen Liu, Yu Jiang, Robbee Wedow, Yue Li, David M. Brazel, Fang Chen, Gargi Datta, Jose Davila-Velderrain, Daniel McGuire, Chao Tian, Xiaowei Zhan, Michelle Agee, Babak Alipanahi, Adam Auton, Robert K. Bell, Katarzyna Bryc, Sarah L. Elson, Pierre Fontanillas, Nicholas A. Furlotte, David A. Hinds, Bethann S. Hromatka, Karen E. Huber, Aaron Kleinman, Nadia K. Litterman, Matthew H. McIntyre, Joanna L. Mountain, Carrie A. M. Northover, J. Fah Sathirapongsasuti, Olga V. Sazonova, Janie F. Shelton, Suyash Shringarpure, Chao Tian, Joyce Y. Tung, Vladimir Vacic, Catherine H. Wilson, Steven J. Pitts, Amy Mitchell, Anne Heidi Skogholt, Bendik S. Winsvold, Bamp#x000F8;rge Sivertsen, Eystein Stordal, Gunnar Morken, Hamp#x000E5;vard Kallestad, Ingrid Heuch, John-Anker Zwart, Katrine Kveli Fjukstad, Linda M. Pedersen, Maiken Elvestad Gabrielsen, Marianne Bakke Johnsen, Marit Skrove, Marit Samp#x000E6;bamp#x000F8, Indredavik, Ole Kristian Drange, Ottar Bjerkeset, Sigrid Bamp#x000F8;rte, Synne amp#x000D8;ien Stensland, Hamp#x000E9;lamp#x000E8;ne Choquet, Anna R. Docherty, Jessica D. Faul, Johanna R. Foerster, Lars G. Fritsche, Maiken Elvestad Gabrielsen, Scott D. Gordon, Jeffrey Haessler, Jouke-Jan Hottenga, Hongyan Huang, Seon-Kyeong Jang, Philip R. Jansen, Yueh Ling, Reedik Mamp#x000E4;gi, Nana Matoba, George McMahon, Antonella Mulas, Valeria Orramp#x000F9;, Teemu Palviainen, Anita Pandit, Gunnar W. Reginsson, Anne Heidi Skogholt, Jennifer A. Smith, Amy E. Taylor, Constance Turman, Gonneke Willemsen, Hannah Young, Kendra A. Young, Gregory J. M. Zajac, Wei Zhao, Wei Zhou, Gyda Bjornsdottir, Jason D. Boardman, Michael Boehnke, Dorret I. Boomsma, Chu Chen, Francesco Cucca, Gareth E. Davies, Charles B. Eaton, Marissa A. Ehringer, Tamp#x000F5;nu Esko, Edoardo Fiorillo, Nathan A. Gillespie, Daniel F. Gudbjartsson, Toomas Haller, Kathleen Mullan Harris, Andrew C. Heath, John K. Hewitt, Ian B. Hickie, John E. Hokanson, Christian J. Hopfer, David J. Hunter, William G. Iacono, Eric O. Johnson, Yoichiro Kamatani, Sharon L. R. Kardia, Matthew C. Keller, Manolis Kellis, Charles Kooperberg, Peter Kraft, Kenneth S. Krauter, Markku Laakso, Penelope A. Lind, Anu Loukola, Sharon M. Lutz, Pamela A. F. Madden, Nicholas G. Martin, Matt McGue, Matthew B. McQueen, Sarah E. Medland, Andres Metspalu, Karen L. Mohlke, Jonas B. Nielsen, Yukinori Okada, Ulrike Peters, Tinca J. C. Polderman, Danielle Posthuma, Alexander P. Reiner, John P. Rice, Eric Rimm, Richard J. Rose, Valgerdur Runarsdottir, Michael C. Stallings, Alena Stanamp#x0010D;amp#x000E1;kovamp#x000E1;, Hreinn Stefansson, Khanh K. Thai, Hilary A. Tindle, Thorarinn Tyrfingsson, Tamara L. Wall, David R. Weir, Constance Weisner, John B. Whitfield, Bendik Slagsvold Winsvold, Jie Yin, Luisa Zuccolo, Laura J. Bierut, Kristian Hveem, James J. Lee, Marcus R. Munafamp#x000F2;, Nancy L. Saccone, Cristen J. Willer, Marilyn C. Cornelis, Sean P. David, David A. Hinds, Eric Jorgenson, Jaakko Kaprio, Jerry A. Stitzel, Kari Stefansson, Thorgeir E. Thorgeirsson, Gonamp#x000E7;alo Abecasis, Dajiang J. Liu, Scott Vrieze