Skip to main content

order statistics directory

Links

“Assessing the Response to Genomic Selection by Simulation”, Buntaran et al 2022

“Assessing the response to genomic selection by simulation”⁠, Harimurti Buntaran, Angela Maria Berna-Vasquez, Andres Gordillo, Valentin Wimmer, Morten Sahr, Hans-Peter Piepho et al (2022-01-20; ; similar):

The goal of any plant breeding program is to maximize genetic gain for traits of interest. In classical quantitative genetics, the genetic gain can be obtained from what is known as Breeders equation. In the past, only phenotypic data was used to compute the genetic gain. The advent of genomic prediction has opened the door to the utilization of dense markers for estimating genomic breeding values or GBV. The salient feature of genomic prediction is the possibility to carry out genomic selection with the assistance of the kinship matrix, hence, improving the prediction accuracy and accelerating the breeding cycle. However, estimates of GBV as such do not provide the full information on the number of entries to be selected as in the classical response to selection. In this paper, we use simulation, based on a fitted mixed model for genomic prediction in a multi-environmental framework, to answer two typical questions of a plant breeder: (1) How many entries need to be selected to have a defined probability of selecting the truly best entry from the population; (2) What is the probability of obtaining the truly best entries when some top-ranked entries are selected.

“On Extensions of Rank Correlation Coefficients to Multivariate Spaces”, Han 2021

“On extensions of rank correlation coefficients to multivariate spaces”⁠, Fang Han (2021-10-07; backlinks; similar):

This note summarizes ideas presented in a lecture for the Bernoulli New Researcher Award 2021.

Rank correlations for measuring and testing against dependence of 2 random scalars are among the oldest and best-known topics in nonparametric statistics⁠.

This note reviews recent progress towards understanding and extending rank correlations to multivariate spaces through building connections to optimal transport and graph-based statistics.

“A Review of the Gumbel-max Trick and Its Extensions for Discrete Stochasticity in Machine Learning”, Huijben et al 2021

“A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning”⁠, Iris A. M. Huijben, Wouter Kool, Max B. Paulus, Ruud J. G. van Sloun (2021-10-04; ; backlinks; similar):

The Gumbel-max trick is a method to draw a sample from a categorical distribution⁠, given by its unnormalized (log-)probabilities. Over the past years, the machine learning community has proposed several extensions of this trick to facilitate, eg. drawing multiple samples, sampling from structured domains, or gradient estimation for error backpropagation in neural network optimization.

The goal of this survey article is to present background about the Gumbel-max trick, and to provide a structured overview of its extensions to ease algorithm selection.

Moreover, it presents a comprehensive outline of (machine learning) literature in which Gumbel-based algorithms have been leveraged, reviews commonly-made design choices, and sketches a future perspective.

“Human Mortality at Extreme Age”, Belzile et al 2021

“Human mortality at extreme age”⁠, Léo R. Belzile, Anthony C. Davison, Holger Rootzén, Dmitrii Zholud (2021-09-29; ; similar):

We use a combination of extreme value statistics⁠, survival analysis and computer-intensive methods to analyse the mortality of Italian and French semi-supercentenarians.

After accounting for the effects of the sampling frame, extreme-value modelling leads to the conclusion that constant force of mortality beyond 108 years describes the data well and there is no evidence of differences between countries and cohorts. These findings are consistent with use of a Gompertz model and with previous analysis of the International Database on Longevity and suggest that any physical upper bound for the human lifespan is so large that it is unlikely to be approached.

Power calculations make it implausible that there is an upper bound below 130 years. There is no evidence of differences in survival between women and men after age 108 in the Italian data and the International Database on Longevity, but survival is lower for men in the French data.

“Artificial Intelligence in Drug Discovery: What Is Realistic, What Are Illusions? Part 1: Ways to Make an Impact, and Why We Are Not There Yet: Quality Is More Important Than Speed and Cost in Drug Discovery”, Bender & Cortés-Ciriano 2021

“Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet: Quality is more important than speed and cost in drug discovery”⁠, Andreas Bender, Isidro Cortés-Ciriano (2021-02; ; backlinks; similar):

We first attempted to simulate the effect of (1) speeding up phases in the drug discovery process, (2) making them cheaper and (3) making individual phases more successful on the overall financial outcome of drug-discovery projects. In every case, an improvement of the respective measure (speed, cost and success of phase) of 20% (in the case of failure rate in relative terms) has been assumed to quantify effects on the capital cost of bringing one successful drug to the market. For the simulations, a patent lifetime of 20 years was assumed, with patent applications filed at the start of clinical Phase I, and the net effect of changes of speed, cost and quality of decisions on overall project return was calculated, assuming that projects, on average, are able to return their own cost…(Studies such as [33], which posed the question of which changes are most efficient in terms of improving R&D productivity, returned similar results to those presented here, although we have quantified them in more detail.)

It can be seen in Figure 2 that a reduction of the failure rate (in particular across all clinical phases) has by far the most substantial impact on project value overall, multiple times that of a reduction of the cost of a particular phase or a decrease in the amount of time a particular phase takes. This effect is most profound in clinical Phase II, in agreement with previous studies33, and it is a result of the relatively low success rate, long duration and high cost of the clinical phases. In other words, increasing the success of clinical phases decreases the number of expensive clinical trials needed to bring a drug to the market, and this decrease in the number of failures matters more than failing more quickly or more cheaply in terms of cost per successful, approved drug.

Figure 2: The impact of increasing speed (with the time taken for each phase reduced by 20%), improving the quality of the compounds tested in each phase (with the failure rate reduced by 20%), and decreasing costs (by 20%) on the net profit of a drug-discovery project, assuming patenting at time of first in human tests, and with other assumptions based on [“When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis”, Scannell & Bosley 2016]. It can be seen that the quality of compounds taken forward has a much more profound impact on the success of projects, far beyond improving the speed and reducing the cost of the respective phase. This has implications for the most beneficial uses of AI in drug-discovery projects.

…When translating this to drug-discovery programmes, this means that AI needs to support:

  1. better compounds going into clinical trials (related to the structure itself, but also including the right dosing/​PK for suitable efficacy versus the safety/​therapeutic index, in the desired target tissue);
  2. better validated targets (to decrease the number of failures owing to efficacy, especially in clinical Phases II and III, which have a profound impact on overall project success and in which target validation is currently probably not yet where one would like it to be [35]);
  3. better patient selection (eg. using biomarkers) [31]; and
  4. better conductance of trials (with respect to, eg. patient recruitment and adherence) [36].

This finding is in line with previous research in the area cited already33, as well as a study that compared the impact of the quality of decisions that can be made to the number of compounds that can be processed with a particular technique 30. In this latter case, the authors found that: “when searching for rare positives (eg. candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/​or unknowable (ie. a 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (eg. tenfold, even 100-fold) changes in models’ brute-force efficiency.” Still, currently the main focus of AI in drug discovery, in many cases, seems to be on speed and cost, as opposed to the quality of decisions.

“Counterproductive Altruism: The Other Heavy Tail”, Kokotajlo & Oprea 2020

2020-kokotajlo.pdf: “Counterproductive Altruism: The Other Heavy Tail”⁠, Daniel Kokotajlo, Alexandra Oprea (2020-05-30; ; backlinks; similar):

First, we argue that the appeal of effective altruism (henceforth, EA) depends substantially on a certain empirical premise we call the Heavy Tail Hypothesis (HTH), which characterizes the probability distribution of opportunities for doing good. Roughly, the HTH implies that the best causes, interventions, or charities produce orders of magnitude greater good than the average ones, constituting a substantial portion of the total amount of good caused by altruistic interventions. Next, we canvass arguments EAs have given for the existence of a positive (or “right”) heavy tail and argue that they can also apply in support of a negative (or “left”) heavy tail where counterproductive interventions do orders of magnitude more harm than ineffective or moderately harmful ones. Incorporating the other heavy tail of the distribution has important implications for the core activities of EA: effectiveness research, cause prioritization, and the assessment of altruistic interventions. It also informs the debate surrounding the institutional critique of EA.

“A New Coefficient of Correlation”, Chatterjee 2020

2020-chatterjee.pdf: “A New Coefficient of Correlation”⁠, Sourav Chatterjee (2020-05-28; ; similar):

[cf. Dette et al 2012⁠, Azadkia & Chatterjee 2019⁠, Lin & Han 2021⁠; nearest-neighbors per Han 2021’s overview.] Is it possible to define a coefficient of correlation which is (1) as simple as the classical coefficients like Pearson’s correlation or Spearman’s correlation⁠, and yet (2) consistently estimates some simple and interpretable measure of the degree of dependence between the variables, which is 0 if and only if the variables are independent and 1 if and only if one is a measurable function of the other, and (3) has a simple asymptotic theory under the hypothesis of independence, like the classical coefficients?

This article answers this question in the affirmative, by producing such a coefficient. No assumptions are needed on the distributions of the variables. There are several coefficients in the literature that converge to 0 if and only if the variables are independent, but none that satisfy any of the other properties mentioned above. Supplementary materials for this article are available online.

[Keywords: correlation, independence, measure of association]

“A Simple Measure of Conditional Dependence”, Azadkia & Chatterjee 2019

“A simple measure of conditional dependence”⁠, Mona Azadkia, Sourav Chatterjee (2019-10-27; backlinks; similar):

We propose a coefficient of conditional dependence between two random variables Y and Z given a set of other variables X1,…,Xp, based on an i.i.d. sample.

The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in [0,1], where the limit is 0 if and only if Y and Z are conditionally independent given X1,…,Xp, and is 1 if and only if Y is equal to a measurable function of Z given X1,…,Xp. Moreover, it has a natural interpretation as a nonlinear generalization of the familiar partial R2 statistic for measuring conditional dependence by regression.

Using this statistic, we devise a new variable selection algorithm, called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions.

A number of applications to synthetic and real datasets are worked out.

“Low Base Rates Prevented Terman from Identifying Future Nobelists”, Warne et al 2019

“Low Base Rates Prevented Terman from Identifying Future Nobelists”⁠, Russell Warne, Ross Larsen, Jonathan Clark (2019-08-28; ; backlinks; similar):

Although the accomplishments of the 1,528 subjects of the Genetic Studies of Genius are impressive, they do not represent the pinnacle of human achievement. Since the early 1990s, commentators (eg. Bond, 2014; Gladwell, 2006; Heilman, 2016; Shurkin, 1992) have drawn attention to the fact that two future Nobelists—William Shockley and Luis Alvarez—were among the 168,000 candidates screened for the study; but they were rejected because their IQ scores were too low. Critics see this as a flaw of Terman’s methodology and/​or intelligence testing. However, events with a low base rate (such as winning a Nobel prize) are difficult to predict (Taylor & Russell 1939).

This study simulates the Terman’s sampling procedure to estimate the probability that Terman’s sampling procedure would have selected one or both future Nobelists from a population of 168,000 candidates. Using data simulations, we created a model that realistically reflected the test-retest and split-half reliability of the IQ scores used to select individuals for the Genetic Studies of Genius and the relationship between IQ and Nobelist status.

Results showed that it was unlikely for Terman to identify children who would later earn Nobel prizes, mostly due to the low base rates of such high future achievement and the high minimum IQ needed to be selected for Terman’s study.

Changes to the methodology that would have been required to select one or both Nobelists for the longitudinal study were not practical. Therefore, Alvarez’s and Shockley’s absence from the Genetic Studies of Genius sample does not invalidate intelligence testing or Terman’s landmark study.

“Scale-free Networks Are Rare”, Broido & Clauset 2019

“Scale-free networks are rare”⁠, Anna D. Broido, Aaron Clauset (2019-03-04; backlinks; similar):

Real-world networks are often claimed to be scale free⁠, meaning that the fraction of nodes with degree k follows a power law k−α, a pattern with broad implications for the structure and dynamics of complex systems. However, the universality of scale-free networks remains controversial.

Here, we organize different definitions of scale-free networks and construct a severe test of their empirical prevalence using state-of-the-art statistical tools applied to nearly 1,000 social, biological, technological, transportation, and information networks.

Across these networks, we find robust evidence that strongly scale-free structure is empirically rare, while for most networks, log-normal distributions fit the data as well or better than power laws. Furthermore, social networks are at best weakly scale free, while a handful of technological and biological networks appear strongly scale free.

These findings highlight the structural diversity of real-world networks and the need for new theoretical explanations of these non-scale-free patterns.

[Keywords: complex networks, network topology, power law, statistical methods]

…The log-normal is a broad distribution that can exhibit heavy tails⁠, but which is nevertheless not scale free. Empirically, the log-normal is favored more than 3× as often (48%) over the power law, as vice versa (12%), and the comparison is inconclusive in a large number of cases (40%). In other words, the log-normal is at least as good a fit as the power law for the vast majority of degree distributions (88%), suggesting that many previously identified scale-free networks may in fact be log-normal networks.

“Open Questions”, Branwen 2018

Questions: “Open Questions”⁠, Gwern Branwen (2018-10-17; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Some anomalies/​questions which are not necessarily important, but do puzzle me or where I find existing explanations to be unsatisfying.

? ? ?

A list of some questions which are not necessarily important, but do puzzle me or where I find existing ‘answers’ to be unsatisfying, categorized by subject (along the lines of Patrick Collison’s list & Alex Guzey⁠; see also my list of project ideas).

“Dog Cloning For Special Forces: Breed All You Can Breed”, Branwen 2018

Clone: “Dog Cloning For Special Forces: Breed All You Can Breed”⁠, Gwern Branwen (2018-09-18; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Decision analysis of whether cloning the most elite Special Forces dogs is a profitable improvement over standard selection procedures. Unless training is extremely cheap or heritability is extremely low, dog cloning is hypothetically profitable.

Cloning is widely used in animal & plant breeding despite steep costs due to its advantages; more unusual recent applications include creating entire polo horse teams and reported trials of cloning in elite police/​Special Forces war dogs. Given the cost of dog cloning, however, can this ever make more sense than standard screening methods for selecting from working dog breeds, or would the increase in successful dog training be too low under all reasonable models to turn a profit?

I model the question as one of expected cost per dog with the trait of successfully passing training, success in training being a dichotomous liability threshold with a polygenic genetic architecture; given the extreme level of selection possible in selecting the best among already-elite Special Forces dogs and a range of heritabilities, this predicts clones’ success probabilities. To approximate the relevant parameters, I look at some reported training costs and success rates for regular dog candidates, broad dog heritabilities, and the few current dog cloning case studies reported in the media.

Since none of the relevant parameters are known with confidence, I run the cost-benefit equation for many hypothetical scenarios, and find that in a large fraction of them covering most plausible values, dog cloning would improve training yields enough to be profitable (in addition to its other advantages).

As further illustration of the use-case of screening for an extreme outcome based on a partial predictor, I consider the question of whether height PGSes could be used to screen the US population for people of NBA height, which turns out to be reasonably doable with current & future PGSes.

“SMPY Bibliography”, Branwen 2018

SMPY: “SMPY Bibliography”⁠, Gwern Branwen (2018-07-28; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

An annotated fulltext bibliography of publications on the Study of Mathematically Precocious Youth (SMPY), a longitudinal study of high-IQ youth.

SMPY (Study of Mathematically Precocious Youth) is a long-running longitudinal survey of extremely mathematically-talented or intelligent youth, which has been following high-IQ cohorts since the 1970s. It has provided the largest and most concrete findings about the correlates and predictive power of screening extremely intelligent children, and revolutionized gifted & talented educational practices.

Because it has been running for over 40 years, SMPY-related publications are difficult to find; many early papers were published only in long-out-of-print books and are not available in any other way. Others are digitized and more accessible, but one must already know they exist. Between these barriers, SMPY information is less widely available & used than it should be given its importance.

To fix this, I have been gradually going through all SMPY citations and making fulltext copies available online with occasional commentary.

“Construction of Arbitrarily Strong Amplifiers of Natural Selection Using Evolutionary Graph Theory”, Pavlogiannis et al 2018

“Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory”⁠, Andreas Pavlogiannis, Josef Tkadlec, Krishnendu Chatterjee, Martin A. Nowak (2018-06-14; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

[WP] Because of the intrinsic randomness of the evolutionary process, a mutant with a fitness advantage has some chance to be selected but no certainty. Any experiment that searches for advantageous mutants will lose many of them due to random drift. It is therefore of great interest to find population structures that improve the odds of advantageous mutants. Such structures are called amplifiers of natural selection: they increase the probability that advantageous mutants are selected. Arbitrarily strong amplifiers guarantee the selection of advantageous mutants, even for very small fitness advantage. Despite intensive research over the past decade, arbitrarily strong amplifiers have remained rare. Here we show how to construct a large variety of them. Our amplifiers are so simple that they could be useful in biotechnology, when optimizing biological molecules, or as a diagnostic tool, when searching for faster dividing cells or viruses. They could also occur in natural population structures.

In the evolutionary process, mutation generates new variants, while selection chooses between mutants that have different reproductive rates. Any new mutant is initially present at very low frequency and can easily be eliminated by random drift⁠. The probability that the lineage of a new mutant eventually takes over the entire population is called the fixation probability⁠. It is a key quantity of evolutionary dynamics and characterizes the rate of evolution.

…In this work we resolve several open questions regarding strong amplification under uniform and temperature initialization. First, we show that there exists a vast variety of graphs with self-loops and weighted edges that are arbitrarily strong amplifiers for both uniform and temperature initialization. Moreover, many of those strong amplifiers are structurally simple, therefore they might be realizable in natural or laboratory setting. Second, we show that both self-loops and weighted edges are key features of strong amplification. Namely, we show that without either self-loops or weighted edges, no graph is a strong amplifier under temperature initialization, and no simple graph is a strong amplifier under uniform initialization.

…In general, the fixation probability depends not only on the graph, but also on the initial placement of the invading mutants…For a wide class of population structures17, which include symmetric ones28, the fixation probability is the same as for the well-mixed population.

… A population structure is an arbitrarily strong amplifier (for brevity hereafter also called “strong amplifier”) if it ensures a fixation probability arbitrarily close to one for any advantageous mutant, r > 1. Strong amplifiers can only exist in the limit of large population size.

Numerical studies30 suggest that for spontaneously arising mutants and small population size, many unweighted graphs amplify for some values of r. But for a large population size, randomly constructed, unweighted graphs do not amplify31. Moreover, proven amplifiers for all values of r are rare. For spontaneously arising mutants (uniform initialization): (1) the Star has fixation probability of ~1 − 1⁄r2 in the limit of large N, and is thus an amplifier17, 32, 33; (2) the Superstar (introduced in ref. 17, see also ref. 34) and the Incubator (introduced in refs. 35, 36), which are graphs with unbounded degree, are strong amplifiers.

…In this work we resolve several open questions regarding strong amplification under uniform and temperature initialization. First, we show that there exists a vast variety of graphs with self-loops and weighted edges that are arbitrarily strong amplifiers for both uniform and temperature initialization. Moreover, many of those strong amplifiers are structurally simple, therefore they might be realizable in natural or laboratory setting. Second, we show that both self-loops and weighted edges are key features of strong amplification. Namely, we show that without either self-loops or weighted edges, no graph is a strong amplifier under temperature initialization, and no simple graph is a strong amplifier under uniform initialization.

Figure 1: Evolutionary dynamics in structured populations. Residents (yellow) and mutants (purple) differ in their reproductive rate. (a) A single mutant appears. The lineage of the mutant becomes extinct or reaches fixation. The probability that the mutant takes over the population is called “fixation probability”. (b) The classical, well-mixed population is described by a complete graph with self-loops. (Self-loops are not shown here.) (c) Isothermal structures do not change the fixation probability compared to the well-mixed population. (d) The Star is an amplifier for uniform initialization. (e) A self-loop means the offspring can replace the parent. Self-loops are a mathematical tool to assign different reproduction rates to different places. (f) The Superstar, which has unbounded degree in the limit of large population size, is a strong amplifier for uniform initialization. Its edges (shown as arrows) are directed which means that the connections are one-way.
Figure 4: Infinite variety of strong amplifiers. Many topologies can be turned into arbitrarily strong amplifiers (Wheel (a), Triangular grid (b), Concentric circles (c), and Tree (d)). Each graph is partitioned into hub (orange) and branches (blue). The weights can be then assigned to the edges so that we obtain arbitrarily strong amplifiers. Thick edges receive large weights, whereas thin edges receive small (or zero) weights

…Intuitively, the weight assignment creates a sense of global flow in the branches, directed toward the hub. This guarantees that the first 2 steps happen with high probability. For the third step, we show that once the mutants fixate in the hub, they are extremely likely to resist all resident invasion attempts and instead they will invade and take over the branches one by one thereby fixating on the whole graph. For more detailed description, see “Methods” section “Construction of strong amplifiers”.

Necessary conditions for amplification: Our main result shows that a large variety of population structures can provide strong amplification. A natural follow-up question concerns the features of population structures under which amplification can emerge. We complement our main result by proving that both weights and self-loops are essential for strong amplification. Thus, we establish a strong dichotomy. Without either weights or self-loops, no graph can be a strong amplifier under temperature initialization, and no simple graph can be a strong amplifier under uniform initialization. On the other hand, if we allow both weights and self-loops, strong amplification is ubiquitous.

…Some naturally occurring population structures could be amplifiers of natural selection⁠. For example, the germinal centers of the immune system might constitute amplifiers for the affinity maturation process of adaptive immunity46. Habitats of animals that are divided into multiple islands with a central breeding location could potentially also act as amplifiers of selection. Our theory helps to identify those structures in natural settings.

“Nature vs. Nurture: Have Performance Gaps Between Men and Women Reached an Asymptote?”, Millard-Stafford et al 2018

2018-millardstafford.pdf: “Nature vs. Nurture: Have Performance Gaps Between Men and Women Reached an Asymptote?”⁠, Mindy Millard-Stafford, Ann E. Swanson, Matthew T. Wittbrodt (2018-04; ⁠, ; similar):

Men outperform women in sports requiring muscular strength and/​or endurance, but the relative influence of “nurture” versus “nature” remains difficult to quantify. Performance gaps between elite men and women are well documented using world records in second, centimeter, or kilogram sports. However, this approach is biased by global disparity in reward structures and opportunities for women.

Despite policies enhancing female participation (Title IX legislation), US women only closed performance gaps by 2% and 5% in Olympic Trial swimming and running, respectively, from 1972 to 1980 (with no change thereafter through 2016). Performance gaps of 13% in elite mid-distance running and 8% in swimming (~4-min duration) remain, the 5% differential between sports indicative of load carriage disadvantages of higher female body fatness in running. Conversely, sprint swimming exhibits a greater sex difference than sprint running, suggesting anthropometric/​power advantages unique to swim-block starts.

The ~40-y plateau in the performance gap suggests a persistent dominance of biological influences (eg. longer limb levers, greater muscle mass, greater aerobic capacity, and lower fat mass) on performance.

Current evidence suggests that women will not swim or run as fast as men in Olympic events, which speaks against eliminating sex segregation in these individual sports. Whether hormone reassignment sufficiently levels the playing field in Olympic sports for transgender females (born and socialized male) remains an issue to be tackled by sport-governing bodies.

“‘Genius Revisited’ Revisited”, Branwen 2016

Hunter: “‘Genius Revisited’ Revisited”⁠, Gwern Branwen (2016-06-19; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A book study of surveys of the high-IQ elementary school HCES concludes that high IQ is not predictive of accomplishment; I point out that results are consistent with regression to the mean from extremely early IQ tests and small total sample size.

Genius Revisited documents the longitudinal results of a high-IQ/​gifted-and-talented elementary school, Hunter College Elementary School (HCES); one of the most striking results is the general high education & income levels, but absence of great accomplishment on a national or global scale (eg. a Nobel prize). The authors suggest that this may reflect harmful educational practices at their elementary school or the low predictive value of IQ.

I suggest that there is no puzzle to this absence nor anything for HCES to be blamed for, as the absence is fully explainable by their making 2 statistical errors: base-rate neglect⁠, and regression to the mean⁠.

First, their standards fall prey to a base-rate fallacy and even extreme predictive value of IQ would not predict 1 or more Nobel prizes because Nobel prize odds are measured at 1 in millions, and with a small total sample size of a few hundred, it is highly likely that there would simply be no Nobels.

Secondly, and more seriously, the lack of accomplishment is inherent and unavoidable as it is driven by the regression to the mean caused by the relatively low correlation of early childhood with adult IQs—which means their sample is far less elite as adults than they believe. Using early-childhood/​adult IQ correlations, regression to the mean implies that HCES students will fall from a mean of 157 IQ in kindergarten (when selected) to somewhere around 133 as adults (and possibly lower). Further demonstrating the role of regression to the mean, in contrast, HCES’s associated high-IQ/​gifted-and-talented high school, Hunter High, which has access to the adolescents’ more predictive IQ scores, has much higher achievement in proportion to its lesser regression to the mean (despite dilution by Hunter elementary students being grandfathered in).

This unavoidable statistical fact undermines the main rationale of HCES: extremely high-IQ adults cannot be accurately selected as kindergartners on the basis of a simple test. This greater-regression problem can be lessened by the use of additional variables in admissions, such as parental IQs or high-quality genetic polygenic scores⁠; unfortunately, these are either politically unacceptable or dependent on future scientific advances. This suggests that such elementary schools may not be a good use of resources and HCES students should not be assigned scarce magnet high school slots.

“When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis”, Scannell & Bosley 2016

“When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis”⁠, Jack W. Scannell, Jim Bosley (2016-02-10; ; backlinks; similar):

A striking contrast runs through the last 60 years of biopharmaceutical discovery, research, and development. Huge scientific and technological gains should have increased the quality of academic science and raised industrial R&D efficiency. However, academia faces a “reproducibility crisis”; inflation-adjusted industrial R&D costs per novel drug increased nearly 100× between 1950 and 2010; and drugs are more likely to fail in clinical development today than in the 1970s. The contrast is explicable only if powerful headwinds reversed the gains and/​or if many “gains” have proved illusory. However, discussions of reproducibility and R&D productivity rarely address this point explicitly.

The main objectives of the primary research in this paper are: (a) to provide quantitatively and historically plausible explanations of the contrast; and (b) identify factors to which R&D efficiency is sensitive.

We present a quantitative decision-theoretic model of the R&D process [a ‘leaky pipeline’⁠; cf. the log-normal]. The model represents therapeutic candidates (eg. putative drug targets, molecules in a screening library, etc.) within a “measurement space”, with candidates’ positions determined by their performance on a variety of assays (eg. binding affinity, toxicity, in vivo efficacy, etc.) whose results correlate to a greater or lesser degree. We apply decision rules to segment the space, and assess the probability of correct R&D decisions.

We find that when searching for rare positives (eg. candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/​or unknowable (ie. an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (eg. 10×, even 100×) changes in models’ brute-force efficiency. We also show how validity and reproducibility correlate across a population of simulated screening and disease models.

We hypothesize that screening and disease models with high predictive validity are more likely to yield good answers and good treatments, so tend to render themselves and their diseases academically and commercially redundant. Perhaps there has also been too much enthusiasm for reductionist molecular models which have insufficient predictive validity. Thus we hypothesize that the average predictive validity of the stock of academically and industrially “interesting” screening and disease models has declined over time, with even small falls able to offset large gains in scientific knowledge and brute-force efficiency. The rate of creation of valid screening and disease models may be the major constraint on R&D productivity.

“Calculating The Gaussian Expected Maximum”, Branwen 2016

Order-statistics: “Calculating The Gaussian Expected Maximum”⁠, Gwern Branwen (2016-01-22; ⁠, ; backlinks; similar):

In generating a sample of n datapoints drawn from a normal/​Gaussian distribution, how big on average the biggest datapoint is will depend on how large n is. I implement a variety of exact & approximate calculations from the literature in R to compare efficiency & accuracy.

In generating a sample of n datapoints drawn from a normal/​Gaussian distribution with a particular mean/​SD, how big on average the biggest datapoint is will depend on how large n is. Knowing this average is useful in a number of areas like sports or breeding or manufacturing, as it defines how bad/​good the worst/​best datapoint will be (eg. the score of the winner in a multi-player game).

The order statistic of the mean/​average/​expectation of the maximum of a draw of n samples from a normal distribution has no exact formula, unfortunately, and is generally not built into any programming language’s libraries.

I implement & compare some of the approaches to estimating this order statistic in the R programming language, for both the maximum and the general order statistic. The overall best approach is to calculate the exact order statistics for the n range of interest using numerical integration via lmomco and cache them in a lookup table, rescaling the mean/​SD as necessary for arbitrary normal distributions; next best is a polynomial regression approximation; finally, the Elfving correction to the Blom 1958 approximation is fast, easily implemented, and accurate for reasonably large n such as n > 100.

“Embryo Selection For Intelligence”, Branwen 2016

Embryo-selection: “Embryo Selection For Intelligence”⁠, Gwern Branwen (2016-01-22; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A cost-benefit analysis of the marginal cost of IVF-based embryo selection for intelligence and other traits with 2016-2017 state-of-the-art

With genetic predictors of a phenotypic trait, it is possible to select embryos during an in vitro fertilization process to increase or decrease that trait. Extending the work of Shulman & Bostrom 2014⁠/​Hsu 2014⁠, I consider the case of human intelligence using SNP-based genetic prediction, finding:

  • a meta-analysis of GCTA results indicates that SNPs can explain >33% of variance in current intelligence scores, and >44% with better-quality phenotype testing
  • this sets an upper bound on the effectiveness of SNP-based selection: a gain of 9 IQ points when selecting the top embryo out of 10
  • the best 2016 polygenic score could achieve a gain of ~3 IQ points when selecting out of 10
  • the marginal cost of embryo selection (assuming IVF is already being done) is modest, at $1,822.7[^\$1,500.0^~2016~]{.supsub} + $243.0[^\$200.0^~2016~]{.supsub} per embryo, with the sequencing cost projected to drop rapidly
  • a model of the IVF process, incorporating number of extracted eggs, losses to abnormalities & vitrification & failed implantation & miscarriages from 2 real IVF patient populations, estimates feasible gains of 0.39 & 0.68 IQ points
  • embryo selection is currently unprofitable (mean: -$435.0[^\$358.0^~2016~]{.supsub}) in the USA under the lowest estimate of the value of an IQ point, but profitable under the highest (mean: $7,570.3[^\$6,230.0^~2016~]{.supsub}). The main constraints on selection profitability is the polygenic score; under the highest value, the NPV EVPI of a perfect SNP predictor is $29.2[^\$24.0^~2016~]{.supsub}b and the EVSI per education/​SNP sample is $86.3[^\$71.0^~2016~]{.supsub}k
  • under the worst-case estimate, selection can be made profitable with a better polygenic score, which would require n > 237,300 using education phenotype data (and much less using fluid intelligence measures)
  • selection can be made more effective by selecting on multiple phenotype traits: considering an example using 7 traits (IQ/​height/​BMI/​diabetes/​ADHD⁠/​bipolar/​schizophrenia), there is a factor gain over IQ alone; the outperformance of multiple selection remains after adjusting for genetic correlations & polygenic scores and using a broader set of 16 traits.

“Comparing the Pearson and Spearman Correlation Coefficients across Distributions and Sample Sizes: A Tutorial Using Simulations and Empirical Data”, Winter et al 2016

2016-dewinter.pdf: “Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data”⁠, Joost C. F. de Winter, Samuel D. Goslin, Jeff Potter (2016-01-01)

“Why the Tails Come Apart”, Thrasymachus 2014

“Why the tails come apart”⁠, Thrasymachus (2014-08-01; ; backlinks; similar):

Many outcomes of interest have pretty good predictors. It seems that height correlates to performance in basketball (the average height in the NBA is around 6’7″). Faster serves in tennis improve one’s likelihood of winning. IQ scores are known to predict a slew of factors, from income⁠, to chance of being imprisoned⁠, to lifespan⁠.

What’s interesting is what happens to these relationships ‘out on the tail’: extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6’7″ is very tall, it lies within a couple of standard deviations of the median US adult male height—there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded⁠, they aren’t the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tend to be very smart⁠, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this) (1).

The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa). Why?

  • The simple graphical explanation
  • An intuitive explanation of the graphical explanation
  • A parallel geometric explanation

“Statistical Notes”, Branwen 2014

Statistical-notes: “Statistical Notes”⁠, Gwern Branwen (2014-07-17; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Miscellaneous statistical stuff

Given two disagreeing polls, one small & imprecise but taken at face-value, and the other large & precise but with a high chance of being totally mistaken, what is the right Bayesian model to update on these two datapoints? I give ABC and MCMC implementations of Bayesian inference on this problem and find that the posterior is bimodal with a mean estimate close to the large unreliable poll’s estimate but with wide credible intervals to cover the mode based on the small reliable poll’s estimate.

“Spearman’s Rho for the AMH Copula: a Beautiful Formula”, Machler 2014

“Spearman’s Rho for the AMH Copula: a Beautiful Formula”⁠, Martin M̈achler (2014-06; ; backlinks; similar):

We derive a beautiful series expansion for Spearman’s rho⁠, ρ(θ) of the Ali-Mikhail-Haq (AMH) copula with parameter θ which is also called α or θ. Further, via experiments we determine the cutoffs to be used for practically fast and accurate computation of ρ(θ) for all θ ∈ [−1,1].

[Keywords: Archimedean copulas, Spearman’s rho.]

“One Man’s Modus Ponens”, Branwen 2012

Modus: “One Man’s Modus Ponens”⁠, Gwern Branwen (2012-05-01; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

One man’s modus ponens is another man’s modus tollens is a saying in Western philosophy encapsulating a common response to a logical proof which generalizes the reductio ad absurdum and consists of rejecting a premise based on an implied conclusion. I explain it in more detail, provide examples, and a Bayesian gloss.

A logically-valid argument which takes the form of a modus ponens may be interpreted in several ways; a major one is to interpret it as a kind of reductio ad absurdum, where by ‘proving’ a conclusion believed to be false, one might instead take it as a modus tollens which proves that one of the premises is false. This “Moorean shift” is aphorized as the snowclone⁠, “One man’s modus ponens is another man’s modus tollens”.

The Moorean shift is a powerful counter-argument which has been deployed against many skeptical & metaphysical claims in philosophy, where often the conclusion is extremely unlikely and little evidence can be provided for the premises used in the proofs; and it is relevant to many other debates, particularly methodological ones.

“The Best And The Rest: Revisiting The Norm Of Normality Of Individual Performance”, Junior & Aguinis 2012

2012-oboyle.pdf: “The Best And The Rest: Revisiting The Norm Of Normality Of Individual Performance”⁠, Ernest O''Boyle Junior, Herman Aguinis (2012-02-27; ; backlinks; similar):

We revisit a long-held assumption in human resource management, organizational behavior, and industrial and organizational psychology that individual performance follows a Gaussian (normal) distribution.

We conducted 5 studies involving 198 samples including 633,263 researchers, entertainers, politicians, and amateur and professional athletes.

Results are remarkably consistent across industries, types of jobs, types of performance measures, and time frames and indicate that individual performance is not normally distributed—instead, it follows a Paretian (power law) distribution. [This is a statistical mistake; they should also test log-normal which would likely fit many better; however, this would probably not meaningfully change the conclusions.]

Assuming normality of individual performance can lead to misspecified theories and misleading practices. Thus, our results have implications for all theories and applications that directly or indirectly address the performance of individual workers including performance measurement and management, utility analysis in pre-employment testing and training and development, personnel selection, leadership, and the prediction of performance, among others.

Figure 2: Distribution of Individual Performance for Researchers (n = 490,185), Emmy Nominees (n = 5,826), United States Representatives (n = 8,976), NBA Career Scorers (n = 3,932), and Major League Baseball (MLB) Career Errors (n = 45,885). Note: for all Y axes, “Frequency” refers to number of individuals. For clarity, individuals with more than 20 publications (Panel a) and more than 15 Emmy nominations (Panel b) were included in the last bins. For panels c–e, participants were divided into 15 equally spaced bins.

…Regarding performance measurement and management, the current zeitgeist is that the median worker should be at the mean level of performance and thus should be placed in the middle of the performance appraisal instrument. If most of those rated are in the lowest category, then the rater, measurement instrument, or both are seen as biased (ie. affected by severity bias; Cascio & Aguinis, 2011 chapter 5). Performance appraisal instruments that place most employees in the lowest category are seen as psychometrically unsound. These basic tenets have spawned decades of research related to performance appraisal that might “improve” the measurement of performance because such measurement would result in normally distributed scores given that a deviation from a normal distribution is supposedly indicative of rater bias (cf. Landy & Farr, 1980; Smither & London, 2009a). Our results suggest that the distribution of individual performance is such that most performers are in the lowest category. Based on Study 1, we discovered that nearly 2⁄3rds (65.8%) of researchers fall below the mean number of publications. Based on the Emmy-nominated entertainers in Study 2, 83.3% fall below the mean in terms of number of nominations. Based on Study 3, for U.S. representatives, 67.9% fall below the mean in terms of times elected. Based on Study 4, for NBA players, 71.1% are below the mean in terms of points scored. Based on Study 5, for MLB players, 66.3% of performers are below the mean in terms of career errors.

Moving from a Gaussian to a Paretian perspective, future research regarding performance measurement would benefit from the development of measurement instruments that, contrary to past efforts, allow for the identification of those top performers who account for the majority of results. Moreover, such improved measurement instruments should not focus on distinguishing between slight performance differences of non-elite workers. Instead, more effort should be placed on creating performance measurement instruments that are able to identify the small cohort of top performers.

As a second illustration of the implications of our results, consider the research domain of utility analysis in pre-employment testing and training and development. Utility analysis is built upon the assumption of normality, most notably with regard to the standard deviation of individual performance (SDy), which is a key component of all utility analysis equations. In their seminal article, Schmidt et al 1979 defined SDy as follows: “If job performance in dollar terms is normally distributed, then the difference between the value to the organization of the products and services produced by the average employee and those produced by an employee at the 85th percentile in performance is equal to SDy” (p. 619). The result was an estimate of $39,678.9$11,327.01979. What difference would a Paretian distribution of job performance make in the calculation of SDy? Consider the distribution found across all 54 samples in Study 1 and the productivity levels in this group at (a) the median, (b) 84.13th percentile, (c) 97.73rd percentile, and (d) 99.86th percentile. Under a normal distribution, these values correspond to standardized scores (z) of 0, 1, 2, and 3. The difference in productivity between the 84.13th percentile and the median was 2, thus an utility analysis assuming normality would use SDy = 2.0. A researcher at the 84th percentile should produce $39,678.9$11,327.01979 more output than the median researcher (adjusted for inflation). Extending to the second standard deviation, the difference in productivity between the 97.73rd percentile and median researcher should be 4, and this additional output is valued at $79,350.8$22,652.01979. However, the difference between the 2 points is actually 7. Thus, if SDy is 2, then the additional output of these workers is $138,877.9$39,645.01979 more than the median worker. Even greater disparity is found at the 99.86th percentile. Productivity difference between the 99.86th percentile and median worker should be 6.0 according to the normal distribution; instead the difference is more than quadruple that (ie. 25.0). With a normality assumption, productivity among these elite workers is estimated at $119,036.7$33,981.01979 ($39,678.9$11,327.01979 × 3) above the median, but the productivity of these workers is actually $495,988.0$141,588.01979 above the median.

We chose Study 1 because of its large overall sample size, but these same patterns of productivity are found across all 5 studies. In light of our results, the value-added created by new pre-employment tests and the dollar value of training programs should be reinterpreted from a Paretian point of view that acknowledges that the differences between workers at the tails and workers at the median are considerably wider than previously thought. These are large and meaningful differences suggesting important implications of shifting from a normal to a Paretian distribution. In the future, utility analysis should be conducted using a Paretian point of view that acknowledges that differences between workers at the tails and workers at the median are considerably wider than previously thought.

…Finally, going beyond any individual research domain, a Paretian distribution of performance may help explain why despite more than a century of research on the antecedents of job performance and the countless theoretical models proposed, explained variance estimates (R2) rarely exceed 0.50 (Cascio & Aguinis, 2008b). It is possible that research conducted over the past century has not made important improvements in the ability to predict individual performance because prediction techniques rely on means and variances assumed to derive from normal distributions, leading to gross errors in the prediction of performance. As a result, even models including theoretically sound predictors and administered to a large sample will most often fail to account for even half of the variability in workers’ performance. Viewing individual performance from a Paretian perspective and testing theories with techniques that do not require the normality assumptions will allow us to improve our understanding of factors that account for and predict individual performance. Thus, research addressing the prediction of performance should be conducted with techniques that do not require the normality assumption.

“A Copula-Based Non-parametric Measure of Regression Dependence”, Dette et al 2012

2012-dette.pdf: “A Copula-Based Non-parametric Measure of Regression Dependence”⁠, Holger Dette, Karl F. Siburg, Pavel A. Stoimenov (2012-02-20; backlinks; similar):

This article presents a framework for comparing bivariate distributions according to their degree of regression dependence.

We introduce the general concept of a regression dependence order (RDO). In addition, we define a new non-parametric measure of regression dependence and study its properties. Besides being monotone in the new RDOs, the measure takes on its extreme values precisely at independence and almost sure functional dependence, respectively.

A consistent non-parametric estimator of the new measure is constructed and its asymptotic properties are investigated. Finally, the finite sample properties of the estimate are studied by means of a small simulation study.

[Keywords: conditional distribution, copula⁠, local linear estimation, measure of dependence, regression, stochastic order]

“Anime Reviews”, Branwen 2010

Anime: “Anime Reviews”⁠, Gwern Branwen (2010-12-14; ⁠, ⁠, ⁠, ; backlinks; similar):

A compilation of anime/​manga reviews since 2010.

This page is a compilation of my anime/​manga reviews; it is compiled from my MyAnimeList account & newsletter⁠. Reviews are sorted by rating in descending order.

See also my book & film /  ​ TV /  ​ theater reviews⁠.

“A New Car-following Model Yielding Log-normal Type Headways Distributions”, Li et al 2010

2010-li.pdf: “A new car-following model yielding log-normal type headways distributions”⁠, Li Li, Wang Fa, Jiang Rui, Hu Jian-Ming, Ji Yan (2010-01-01; backlinks)

“Power-law Distributions in Empirical Data”, Clauset et al 2007

“Power-law distributions in empirical data”⁠, Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman (2007-06-07; backlinks; similar):

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.

“The Major Role of Clinicians in the Discovery of Off-Label Drug Therapies”, DeMonaco et al 2006

2006-demonaco.pdf: “The Major Role of Clinicians in the Discovery of Off-Label Drug Therapies”⁠, Harold J. DeMonaco, Ayfer Ali, Eric von Hippel (2006-03-01; ⁠, ; backlinks; similar):

Objective: To determine the role of clinicians in the discovery of off-label use of prescription drugs approved by the United States Food and Drug Administration (FDA).

Data Sources: Micromedex Healthcare Series was used to identify new uses of new molecular entities approved by the FDA in 1998, literature from January 1999–December 2003 was accessed through MEDLINE⁠, and relevant patents were identified through the U.S. Patent and Trademark Office.

Data Synthesis and Main Finding: A survey of new therapeutic uses for new molecular entity drugs approved in 1998 was conducted for the subsequent 5 years of commercial availability. During that period, 143 new applications were identified in a computerized search of the literature for the 29 new drugs considered and approved in 1998. Literature and patent searches were conducted to identify the first report of each new application. Authors of the seminal articles were contacted through an electronic survey to determine whether they were in fact the originators of the new applications. If they were, examination of article content and author surveys were used to explore if each new application was discovered through clinical practice that was independent of pharmaceutical company or university research (field discovery) or if the discovery was made by or with the involvement of pharmaceutical company or university researchers (central discovery). 82 (57%) of the 143 drug therapy innovations in our sample were discovered by practicing clinicians through field discovery.

Conclusion: To our knowledge, the major role of clinicians in the discovery of new, off-label drug therapies has not been previously documented or explored. We propose that this finding has important regulatory and health policy implications.

“Copula Associated to Order Statistics”, Anjos et al 2005

2005-dosanjos.pdf: “Copula associated to order statistics”⁠, Ulisses U. dos Anjos, Nikolai Kolev, Nelson I. Tanaka (2005-12; backlinks; similar):

We exhibit a copula representation of the (r, s)-th bivariate order statistics from an independent sample of size n. We give conditions when such a representation converges weakly to a bivariate Gaussian copula. A recurrence relationship between the density of the order statistics is presented and related Fréchet bounds are given. The usefulness of those results are stressed through examples.

[Keywords: Bivariate binomial, copula, Fréchet bounds, normal asymptotics, order statistics.]

“Computing the Distribution and Expected Value of the Concomitant Rank-Order Statistics”, Barakat & El-Shandidy 2004

2004-barakat.pdf: “Computing the Distribution and Expected Value of the Concomitant Rank-Order Statistics”⁠, H. M. Barakat, M. A. El-Shandidy (2004-01; similar):

This work gives a new representation of the distribution and expected value of the concomitant rank of order statistics.

An advantage of this representation is its ability to extend without any complexity to the multivariate case. Moreover, it gives a new direct approach to compute an approximate formula for the distribution and expected value of the concomitant rank of order statistics. Finally, an upper bound is derived for the confidence level of the tolerance region of the original bivariate (resp., multivariate) d.f., from which the sample is drawn.

[Keywords: order statistics, concomitants, ranking, tolerance region]

“Accurate Approximation to the Extreme Order Statistics of Gaussian Samples”, Chen & Tyler 1999

1999-chen.pdf: “Accurate approximation to the extreme order statistics of Gaussian samples”⁠, Chien-Chung Chen, Christopher W. Tyler (1999; backlinks; similar):

Evaluation of the integral properties of Gaussian Statistics is problematic because the Gaussian function is not analytically integrable. We show that the expected value of the greatest order statistics in Gaussian samples (the max distribution) can be accurately approximated by the expression Φ−1(0.52641/​n), where n is the sample size and Φ−1 is the inverse of the Gaussian cumulative distribution function. The expected value of the least order statistics in Gaussian samples (the min distribution) is correspondingly approximated by -Φ−1(0.52641/​n). The standard deviation of both extreme order distributions can be approximated by the expression 0.5[Φ−1(0.88321/​n)—Φ−1(0.21421/​n)]. We also show that the probability density function of the extreme order distribution can be well approximated by gamma distributions with appropriate parameters. These approximations are accurate, computationally efficient, and readily implemented by build-in functions in many commercial mathematical software packages such as Matlab, Mathematica, and Excel.

“Seeing The Forest From The Trees: When Predicting The Behavior Or Status Of Groups, Correlate Means”, Lubinski & Humphreys 1996b

1996-lubinski-2.pdf: “Seeing The Forest From The Trees: When Predicting The Behavior Or Status Of Groups, Correlate Means”⁠, David Lubinski, Lloyd G. Humphreys (1996; ; backlinks; similar):

When measures of individual differences are used to predict group performance, the reporting of correlations computed on samples of individuals invites misinterpretation and dismissal of the data. In contrast, if regression equations, in which the correlations required are computed on bivariate means, as are the distribution statistics, it is difficult to underappreciate or lightly dismiss the utility of psychological predictors.

Given sufficient sample size and linearity of regression, this technique produces cross-validated regression equations that forecast criterion means with almost perfect accuracy. This level of accuracy is provided by correlations approaching unity between bivariate samples of predictor and criterion means, and this holds true regardless of the magnitude of the “simple” correlation (eg. rxy = 0.20, or rxy = 0.80).

We illustrate this technique empirically using a measure of general intelligence as the predictor and other measures of individual differences and socioeconomic status as criteria. In addition to theoretical applications pertaining to group trends, this methodology also has implications for applied problems aimed at developing policy in numerous fields.

…To summarize, psychological variables generating modest correlations frequently are discounted by those who focus on the magnitude of unaccounted for criterion variance, large standard errors, and frequent false positive and false negative errors in predicting individuals. Dismissal of modest correlations (and the utility of their regressions) by professionals based on this psychometric-statistical reasoning has spread to administrators, journalists, and legislative policy makers. Some examples of this have been compiled by Dawes (1979, 1988) and Linn (1982). They range from squaring a correlation of 0.345 (ie. 0.12) and concluding that for 88% of students, “An SAT score will predict their grade rank no more accurately than a pair of dice” (cf. Linn, 1982, p. 280) to evaluating the differential utility of two correlations 0.20 and 0.40 (based on different procedures for selecting graduate students) as “twice of nothing is nothing” (cf. Dawes, 1979, p. 580).

…Tests are used, however, in ways other than the prediction of individuals or of a specific outcome for Johnny or Jane. And policy decisions based on tests frequently have broader implications for individuals beyond those directly involved in the assessment and selection context (see the discussion later in this article). For example, selection of personnel in education, business, industry, and the military focuses on the criterion performance of groups of applicants whose scores on selection instruments differ. Selection psychologists have long made use of modest predictive correlations when the ratio of applicants to openings becomes large. The relation of utility to size of correlation, relative to the selection ratio and base rate for success (if one ignores the test scores), is incorporated in the well-known Taylor-Russell (1939) tables. These tables are examples of how psychological tests have revealed convincingly economic and societal benefits (Hartigan & Wigdor 1989), even when a correlation of modest size remains at center stage. For example, given a base rate of 30% for adequate performance and a predictive validity coefficient of 0.30 within the applicant population, selecting the top 20% on the predictor test will result in 46% of hires ultimately achieving adequate performance (a 16% gain over base rate). To be sure, the prediction for individuals within any group is not strong—about 9% of the variance in job performance. Yet, when training is expensive or time-consuming, this can result in huge savings. For analyses of groups composed of anonymous persons, however, there is a more unequivocal way of illustrating the importance of modest correlations than even the Taylor-Russell tables provide.

Rationale for an Alternative Approach: Applied psychologists discovered decades ago that it is more advantageous to report correlations between a continuous predictor and a dichotomous criterion graphically rather than as a number that varies between zero and one. For example, the correlation (point biserial) of about 0.40 with the pass-fail pilot training criterion and an ability-stanine predictor looks quite impressive when graphed in the manner of Figure 1a. In contrast, in Figure 1b, a scatter plot of a correlation of 0.40 between two continuous measures looks at first glance like the pattern of birdshot on a target. It takes close scrutiny to perceive that the pattern in Figure 1b is not quite circular for the small correlation. Figure 1a communicates the information more effectively than Figure 1b. When the data on the predictive validity of the pilot ability-stanine were presented in the form of Figure 1a (rather than, say, as a scatter plot of a correlation of 0.40; Figure 1b), general officers in recruitment, training, logistics, and operations immediately grasped the importance of the data for their problems. Because the Army Air Forces were an attractive career choice, there were many more applicants for pilot training than could be accommodated and selection was required…A small gain on a criterion for an unit of gain on the predictor, as long as it is predicted with near-perfect accuracy, can have high utility.

Figure 1. a: Percentage of pilots eliminated from a training class as a function of pilot aptitude rating in stanines. Number of trainees in each stanine is shown on each bar. (From DuBois 1947). b: A synthetic example of a correlation of 0.40 (n = 400).

“Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery”, Hartigan & Wigdor 1989

1989-hartigan-fairnessinemploymenttesting.pdf: “Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery”⁠, John A. Hartigan, Alexandra K. Wigdor (1989; backlinks; similar):

Declining American competitiveness in world economic markets has renewed interest in employment testing as a way of putting the right workers in the right jobs. A new study of the U.S. Department of Labor’s General Aptitude Test Battery (GATB) Referral System sheds light on key questions for America’s employers: How well does the GATB predict job success? Are there scientific justifications for adjusting minority test scores? Will increased use of the GATB result in substantial increases in productivity?

Fairness in Employment Testing evaluates both the validity generalization techniques used to justify the use of the GATB across the spectrum of U.S. jobs and the policy of adjusting test scores to promote equal opportunity.


This volume is one of a number of studies conducted under the aegis of the National Research Council/​National Academy of Sciences that deal with the use of standardized ability tests to make decisions about people in employment or educational settings. Because such tests have a sometimes important role in allocating opportunities in American society, their use is quite rightly subject to questioning and not infrequently to legal scrutiny. At issue in this report is the use of a federally sponsored employment test, the General Aptitude Test Battery (GATB), to match job seekers to requests for job applicants from private-sector and public-sector employers. Developed in the late 1940s by the U.S. Employment Service (USES), a division of the Department of Labor, the GATB is used for vocational counseling and job referral by state-administered Employment Service (also known as Job Service) offices located in some 1,800 communities around the country.


  • Front Matter
  • Summary
  1. The Policy Context
  2. Issues in Equity and Law
  3. The Public Employment Service
  4. The GATB: Its Character and Psychometric Properties
  5. Problematic Features of the GATB: Test Administration, Speedness, and Coachability
  6. The Theory of Validity Generalization
  7. Validity Generalization Applied to the GATB
  8. GATB Validities
  9. Differential Validity and Differential Prediction
  10. The VG-GATB Program: Concept, Promotion, and Implementation
  11. In Whose Interest: Potential Effects of the VG-GATB Referral System
  12. Evaluation of Economic Claims
  13. Recommendations for Referral and Score Reporting
  14. Central Recommendations
  • References
  • Appendix A: A Synthesis of Research on Some Psychometric Properties of the GATB
  • Appendix B: Tables Summarizing GATB Reliabilities
  • Appendix C: Biographical Sketches, Committee Members and Staff
  • Index

“Forecasting Records by Maximum Likelihood”, Smith 1988

1988-smith.pdf: “Forecasting Records by Maximum Likelihood”⁠, Richard L. Smith (1988; ; similar):

A maximum likelihood method of fitting a model to a series of records is proposed, using ideas from the analysis of censored data to construct a likelihood function based on observed records.

This method is tried out by fitting several models to series of athletics records for mile and marathon races. A form of residual analysis is proposed for testing the models. Forecasting consequences are also considered.

In the case of mile records⁠, a steady linear improvement since 1931 is found. The marathon data are harder to interpret, with a steady improvement until 1965 with only slight improvement in world records since then.

In both cases, the normal distribution appears at least as good as extreme-value distributions for the distribution of annual best performances. Short-term forecasts appear satisfactory, but serious reservations are expressed about using regression-type methods to predict long-term performance limits.

[Keywords: athletics records, censored data, generalized extreme-value distribution, Gumbel distribution⁠, inference for stochastic processes]

“The Asymptotic Theory of Extreme Order Statistics, Second Edition”, Galambos 1987

1987-galambos-theasymptotictheoryofextremeorderstatistics2nded.pdf: “The Asymptotic Theory of Extreme Order Statistics, Second Edition”⁠, Janos Galambos (1987-01-01; backlinks)

“Expected Normal Order Statistics (Exact and Approximate)”, Royston 1982

1982-royston.pdf: “Expected Normal Order Statistics (Exact and Approximate)”⁠, J. P. Royston (1982-01-01; backlinks)

“Impact of Valid Selection Procedures on Work-force Productivity”, Schmidt et al 1979

1979-schmidt.pdf: “Impact of valid selection procedures on work-force productivity”⁠, Frank L. Schmidt, J. E. Hunter, R. C. McKenzie, T. W. Muldrow (1979; ; backlinks; similar):

Used decision theoretic equations to estimate the impact of the Programmer Aptitude Test (PAT) on productivity if used to select new computer programmers for 1 yr in the federal government and the national economy. A newly developed technique was used to estimate the standard deviation of the dollar value of employee job performance, which in the past has been the most difficult and expensive item of required information. For the federal government and the US economy separately, results are presented for different selection ratios and for different assumed values for the validity of previously used selection procedures. The impact of the PAT on programmer productivity was substantial for all combinations of assumptions. Results support the conclusion that hundreds of millions of dollars in increased productivity could be realized by increasing the validity of selection decisions in this occupation. Similarities between computer programmers and other occupations are discussed. It is concluded that the impact of valid selection procedures on work-force productivity is considerably greater than most personnel psychologists have believed.

“Asymptotic Independence of Certain Statistics Connected With the Extreme Order Statistics in a Bivariate Distribution”, Srivastava 1967

1967-srivastava.pdf: “Asymptotic Independence of Certain Statistics Connected with the Extreme Order Statistics in a Bivariate Distribution”⁠, O. P. Srivastava (1967-06; backlinks; similar):

The exact distribution of extremes in a sample and its limiting forms are well known in the univariate case. The limiting form for the largest observation in a sample was derived by Fisher and Tippet (1928) as early as 1927 by a functional equation, and that for the smallest was studied by Smirnov (1952). Though the joint distribution of two extremes has not been fully studied yet Sibuya (1960) gave a necessary and sufficient condition for the asymptotic independence of two largest extremes in a bivariate distribution. In this paper a necessary and sufficient condition for the asymptotic independence of two smallest observations in a bivariate sample has been derived, and the result has been used to find the condition for the asymptotic independence of any pair of extreme order statistics, one in each component of the bivariate sample. This result is further extended to find the condition for asymptotic independence of the pair of distances between two order statistics, arising from each component.

“Estimating Bounds on Athletic Performance”, Deakin 1967

1967-deakin.pdf: “Estimating Bounds on Athletic Performance”⁠, Michael Deakin (1967-05-01; similar):

A sequence of athletic records forms by definition a monotonic sequence. Since it is reasonable to assume this to be bounded, it follows that a limit must exist to future performance. It has been proposed [1, 2] that this can be estimated by use of a curve-fitting procedure on existing data, such as may be found in many compendia of athletic records (eg. [3]).

“Asymptotic Independence of Bivariate Extremes”, Mardia 1964

1964-mardia.pdf: “Asymptotic Independence of Bivariate Extremes”⁠, K. V. Mardia (1964-11-01; backlinks; similar):

Sibuya (1960) has given a necessary and sufficient condition for asymptotic independence of two extremes for a sample from bivariate population. We shall obtain such a condition for asymptotic independence of all the four extremes X, X’, Y and Y’. It assumes a very simple form when f(x, y) is symmetrical in x and y, and the marginal p. d. f. of x and y have the same form. Under these conditions on the p. d. f., a modification is possible in the condition given by Sibuya (1960) which reduces to one given by Watson (1954) for other purpose. It is further shown that extremes for samples from bivariate normal population satisfy our condition if |p| < 1, where p is the population correlation coefficient. Geffroy (1958) and Sibuya (1960) have proved a particular result for asymptotic independence of only two extremes X and Y in the normal case.

“Bivariate Extreme Statistics, I”, Sibuya 1960

“Bivariate Extreme Statistics, I”⁠, Masaaki Sibuya (1960; backlinks; similar):

The largest and the smallest value in a sample, and other statistics related to them are generally named extreme statistics. Their sampling distributions, especially the limit distributions, have been studied by many authors, and principal results are summarized in the recent Gumbel book.

The author extends here the notion of extreme statistics into bivariate distributions and considers the joint distributions of maxima of components in sample vectors. This Part I treats asymptotic properties of the joint distributions.

In the univariate case the limit distributions of the sample maximum were limited to only three types. In the bivariate case, however, types of the limit joint distribution are various: Theorem 5 in Chapter 2 shows that infinitely many types of limit distributions may exist. For a wide class of distributions, two maxima are asymptotically independent or degenerate on a curve. Theorems 2 and 4 give the attraction domains for such limits. In bivariate normal case, two maxima are asymptotically independent unless the correlation coefficient is equal to one.

Throughout these arguments we remark only the dependence between marginal distributions, whose behaviors are well established. For this purpose a fundamental notion of ‘dependence function’ is introduced and discussed in Section 1.

“Statistical Estimates and Transformed Beta-Variables”, Blom 1958

1958-blom-orderstatistics.pdf: “Statistical Estimates and Transformed Beta-Variables”⁠, Gunnar Blom (1958-01-01; backlinks)

“On the Statistics of Individual Variations of Productivity in Research Laboratories”, Shockley 1957

1957-shockley.pdf: “On the Statistics of Individual Variations of Productivity in Research Laboratories”⁠, William Shockley (1957; ; backlinks; similar):

It is well-known that some workers in scientific research laboratories are enormously more creative than others.

If the number of scientific publications is used as a measure of productivity, it is found that some individuals create new science at a rate at least 50× greater than others. Thus differences in rates of scientific production are much bigger than differences in the rates of performing simpler acts, such as the rate of running the mile, or the number of words a man can speak per minute. On the basis of statistical studies of rates of publication, it is found that it is more appropriate to consider not simply the rate of publication but its logarithm. The logarithm appears to have a normal distribution over the population of typical research laboratories. The existence of a “log-normal distribution” suggests that the logarithm of the rate of production is a manifestation of some fairly fundamental mental attribute.

The great variation in rate of production from one individual to another can be explained on the basis of simplified models of the mental processes concerned. The common feature in the models is that a large number of factors are involved so that small changes in each, all in the same direction, may result in a very large [multiplicative] change in output. For example, the number of ideas a scientist can bring into awareness at one time may control his ability to make an invention and his rate of invention may increase very rapidly with this number.

“The Classification Program”, Dubois 1947-page-143

1947-dubois-theclassificationprogram.pdf#page=143: “The Classification Program”⁠, Philip H. Dubois (1947-01-01; backlinks)

“The Asymptotical Distribution of Range in Samples from a Normal Population”, Elfving 1947

1947-elfving.pdf: “The Asymptotical Distribution of Range in Samples from a Normal Population”⁠, G. Elfving (1947; backlinks; similar):

Consider a sample of n observations, taken from an infinite normal population with the mean 0 and the standard deviation 1. Let a be the smallest and b the greatest of the observed values. Then w = b − a is the range of the sample. For certain statistical purposes knowledge of the sampling distribution of range is needed. The distribution function, however, involves a rather complicated integral, whose exact calculation is, for n > 2, impossible…it seems to be at least of theoretical interest to investigate the asymptotical distribution of range for n → ∞. This is the purpose of the present paper.

“Ability and Income”, Burt 1943-page-14

1943-burt.pdf#page=14: “Ability and Income”⁠, Cyril Burt (1943-01-01; ; backlinks)

Miscellaneous