notes/Pipeline (Link Bibliography)

“notes/​Pipeline” links:

  1. 2010-li-quincunx-lognormal.png

  2. 2010-li.pdf: “A new car-following model yielding log-normal type headways distributions”⁠, Li Li (李 力), Wang Fa (王 法), Jiang Rui (姜 锐), Hu Jian-Ming (胡坚明), Ji Yan (吉 岩)


  4. 1957-shockley.pdf: ⁠, William Shockley (1957; iq):

    It is well-known that some workers in scientific research laboratories are enormously more creative than others. If the number of scientific publications is used as a measure of productivity, it is found that some individuals create new science at a rate at least 50 times greater than others. Thus differences in rates of scientific production are much bigger than differences in the rates of performing simpler acts, such as the rate of running the mile, or the number of words a man can speak per minute. On the basis of statistical studies of rates of publication, it is found that it is more appropriate to consider not simply the rate of publication but its logarithm. The logarithm appears to have a normal distribution over the population of typical research laboratories. The existence of a “log-normal distribution” suggests that the logarithm of the rate of production is a manifestation of some fairly fundamental mental attribute. The great variation in rate of production from one individual to another can be explained on the basis of simplified models of the mental processes concerned. The common feature in the models is that a large number of factors are involved so that small changes in each, all in the same direction, may result in a very large [multiplicative] change in output. For example, the number of ideas a scientist can bring into awareness at one time may control his ability to make an invention and his rate of invention may increase very rapidly with this number.

  5. 2012-oboyle.pdf: ⁠, Ernest O''Boyle Junior, Herman Aguinis (2012-02-27; economics):

    We revisit a long-held assumption in human resource management, organizational behavior, and industrial and organizational psychology that individual performance follows a Gaussian (normal) distribution.

    We conducted 5 studies involving 198 samples including 633,263 researchers, entertainers, politicians, and amateur and professional athletes.

    Results are remarkably consistent across industries, types of jobs, types of performance measures, and time frames and indicate that individual performance is not normally distributed—instead, it follows a Paretian () distribution. [This is a statistical mistake; they should also test which would likely fit many better; however, this would probably not meaningfully change the conclusions.]

    Assuming normality of individual performance can lead to misspecified theories and misleading practices. Thus, our results have implications for all theories and applications that directly or indirectly address the performance of individual workers including performance measurement and management, utility analysis in pre-employment testing and training and development, personnel selection, leadership, and the prediction of performance, among others.

    Figure 2: Distribution of Individual Performance for Researchers (n = 490,185), Emmy Nominees (n = 5,826), United States Representatives (n = 8,976), NBA Career Scorers (n = 3,932), and Major League Baseball (MLB) Career Errors (n = 45,885). Note: for all Y axes, “Frequency” refers to number of individuals. For clarity, individuals with more than 20 publications (Panel a) and more than 15 Emmy nominations (Panel b) were included in the last bins. For panels c–e, participants were divided into 15 equally spaced bins.

    …Regarding performance measurement and management, the current zeitgeist is that the median worker should be at the mean level of performance and thus should be placed in the middle of the performance appraisal instrument. If most of those rated are in the lowest category, then the rater, measurement instrument, or both are seen as biased (ie., affected by severity bias; Cascio & Aguinis, 2011 chapter 5). Performance appraisal instruments that place most employees in the lowest category are seen as psychometrically unsound. These basic tenets have spawned decades of research related to performance appraisal that might “improve” the measurement of performance because such measurement would result in normally distributed scores given that a deviation from a normal distribution is supposedly indicative of rater bias (cf. Landy & Farr, 1980; Smither & London, 2009a). Our results suggest that the distribution of individual performance is such that most performers are in the lowest category. Based on Study 1, we discovered that nearly 2⁄3rds (65.8%) of researchers fall below the mean number of publications. Based on the Emmy-nominated entertainers in Study 2, 83.3% fall below the mean in terms of number of nominations. Based on Study 3, for U.S. representatives, 67.9% fall below the mean in terms of times elected. Based on Study 4, for NBA players, 71.1% are below the mean in terms of points scored. Based on Study 5, for MLB players, 66.3% of performers are below the mean in terms of career errors.

    Moving from a Gaussian to a Paretian perspective, future research regarding performance measurement would benefit from the development of measurement instruments that, contrary to past efforts, allow for the identification of those top performers who account for the majority of results. Moreover, such improved measurement instruments should not focus on distinguishing between slight performance differences of non-elite workers. Instead, more effort should be placed on creating performance measurement instruments that are able to identify the small cohort of top performers.

    As a second illustration of the implications of our results, consider the research domain of utility analysis in pre-employment testing and training and development. Utility analysis is built upon the assumption of normality, most notably with regard to the standard deviation of individual performance (SDy), which is a key component of all utility analysis equations. In their seminal article, defined SDy as follows: “If job performance in dollar terms is normally distributed, then the difference between the value to the organization of the products and services produced by the average employee and those produced by an employee at the 85th percentile in performance is equal to SDy” (p. 619). The result was an estimate of $37,598$11,3271979. What difference would a Paretian distribution of job performance make in the calculation of SDy? Consider the distribution found across all 54 samples in Study 1 and the productivity levels in this group at (a) the median, (b) 84.13th percentile, (c) 97.73rd percentile, and (d) 99.86th percentile. Under a normal distribution, these values correspond to standardized scores (z) of 0, 1, 2, and 3. The difference in productivity between the 84.13th percentile and the median was 2, thus a utility analysis assuming normality would use SDy = 2.0. A researcher at the 84th percentile should produce $37,598$11,3271979 more output than the median researcher (adjusted for inflation). Extending to the second standard deviation, the difference in productivity between the 97.73rd percentile and median researcher should be 4, and this additional output is valued at $75,189$22,6521979. However, the difference between the 2 points is actually 7. Thus, if SDy is 2, then the additional output of these workers is $131,594$39,6451979 more than the median worker. Even greater disparity is found at the 99.86th percentile. Productivity difference between the 99.86th percentile and median worker should be 6.0 according to the normal distribution; instead the difference is more than quadruple that (ie., 25.0). With a normality assumption, productivity among these elite workers is estimated at $112,793$33,9811979 ($37,598$11,3271979 × 3) above the median, but the productivity of these workers is actually $469,974$141,5881979 above the median.

    We chose Study 1 because of its large overall sample size, but these same patterns of productivity are found across all 5 studies. In light of our results, the value-added created by new pre-employment tests and the dollar value of training programs should be reinterpreted from a Paretian point of view that acknowledges that the differences between workers at the tails and workers at the median are considerably wider than previously thought. These are large and meaningful differences suggesting important implications of shifting from a normal to a Paretian distribution. In the future, utility analysis should be conducted using a Paretian point of view that acknowledges that differences between workers at the tails and workers at the median are considerably wider than previously thought.

    …Finally, going beyond any individual research domain, a Paretian distribution of performance may help explain why despite more than a century of research on the antecedents of job performance and the countless theoretical models proposed, explained estimates (R2) rarely exceed 0.50 (). It is possible that research conducted over the past century has not made important improvements in the ability to predict individual performance because prediction techniques rely on means and variances assumed to derive from normal distributions, leading to gross errors in the prediction of performance. As a result, even models including theoretically sound predictors and administered to a large sample will most often fail to account for even half of the variability in workers’ performance. Viewing individual performance from a Paretian perspective and testing theories with techniques that do not require the normality assumptions will allow us to improve our understanding of factors that account for and predict individual performance. Thus, research addressing the prediction of performance should be conducted with techniques that do not require the normality assumption.


  7. 1943-burt.pdf#page=14: “Ability and Income”⁠, Cyril Burt

  8. 1980-jensen-biasinmentaltesting.pdf#page=107

  9. 1996-jensen.pdf#page=6

  10. ⁠, Jack W. Scannell, Jim Bosley (2016-02-10):

    A striking contrast runs through the last 60 years of biopharmaceutical discovery, research, and development. Huge scientific and technological gains should have increased the quality of academic science and raised industrial R&D efficiency. However, academia faces a “reproducibility crisis”; inflation-adjusted industrial R&D costs per novel drug increased nearly 100× between 1950 and 2010; and drugs are more likely to fail in clinical development today than in the 1970s. The contrast is explicable only if powerful headwinds reversed the gains and/​​​​or if many “gains” have proved illusory. However, discussions of reproducibility and R&D productivity rarely address this point explicitly.

    The main objectives of the primary research in this paper are: (a) to provide quantitatively and historically plausible explanations of the contrast; and (b) identify factors to which R&D efficiency is sensitive.

    We present a quantitative decision-theoretic model of the R&D process [a ‘leaky pipeline’⁠; cf the log-normal]. The model represents therapeutic candidates (eg., putative drug targets, molecules in a screening library, etc.) within a “measurement space”, with candidates’ positions determined by their performance on a variety of assays (eg., binding affinity, toxicity, in vivo efficacy, etc.) whose results correlate to a greater or lesser degree. We apply decision rules to segment the space, and assess the probability of correct R&D decisions.

    We find that when searching for rare positives (eg., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/​​​​or unknowable (ie., an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (eg., 10×, even 100×) changes in models’ brute-force efficiency. We also show how validity and reproducibility correlate across a population of simulated screening and disease models.

    We hypothesize that screening and disease models with high predictive validity are more likely to yield good answers and good treatments, so tend to render themselves and their diseases academically and commercially redundant. Perhaps there has also been too much enthusiasm for reductionist molecular models which have insufficient predictive validity. Thus we hypothesize that the average predictive validity of the stock of academically and industrially “interesting” screening and disease models has declined over time, with even small falls able to offset large gains in scientific knowledge and brute-force efficiency. The rate of creation of valid screening and disease models may be the major constraint on R&D productivity.

  11. ⁠, Andreas Bender, Isidro Cortés-Ciriano (2021-02):

    We first attempted to simulate the effect of (1) speeding up phases in the drug discovery process, (2) making them cheaper and (3) making individual phases more successful on the overall financial outcome of drug-discovery projects. In every case, an improvement of the respective measure (speed, cost and success of phase) of 20% (in the case of failure rate in relative terms) has been assumed to quantify effects on the capital cost of bringing one successful drug to the market. For the simulations, a patent lifetime of 20 years was assumed, with patent applications filed at the start of clinical Phase I, and the net effect of changes of speed, cost and quality of decisions on overall project return was calculated, assuming that projects, on average, are able to return their own cost…(Studies such as [], which posed the question of which changes are most efficient in terms of improving R&D productivity, returned similar results to those presented here, although we have quantified them in more detail.)

    It can be seen in Figure 2 that a reduction of the failure rate (in particular across all clinical phases) has by far the most substantial impact on project value overall, multiple times that of a reduction of the cost of a particular phase or a decrease in the amount of time a particular phase takes. This effect is most profound in clinical Phase II, in agreement with previous studies [33], and it is a result of the relatively low success rate, long duration and high cost of the clinical phases. In other words, increasing the success of clinical phases decreases the number of expensive clinical trials needed to bring a drug to the market, and this decrease in the number of failures matters more than failing more quickly or more cheaply in terms of cost per successful, approved drug.

    Figure 2: The impact of increasing speed (with the time taken for each phase reduced by 20%), improving the quality of the compounds tested in each phase (with the failure rate reduced by 20%), and decreasing costs (by 20%) on the net profit of a drug-discovery project, assuming patenting at time of first in human tests, and with other assumptions based on [“When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis”, Scannell & Bosley 2016]. It can be seen that the quality of compounds taken forward has a much more profound impact on the success of projects, far beyond improving the speed and reducing the cost of the respective phase. This has implications for the most beneficial uses of AI in drug-discovery projects.

    …When translating this to drug-discovery programmes, this means that AI needs to support:

    1. better compounds going into clinical trials (related to the structure itself, but also including the right dosing/​​​​​PK for suitable efficacy versus the safety/​​​​​therapeutic index, in the desired target tissue);
    2. better validated targets (to decrease the number of failures owing to efficacy, especially in clinical Phases II and III, which have a profound impact on overall project success and in which target validation is currently probably not yet where one would like it to be []);
    3. better patient selection (eg., using biomarkers) []; and
    4. better conductance of trials (with respect to, eg., patient recruitment and adherence) [36].

    This finding is in line with previous research in the area cited already [33], as well as a study that compared the impact of the quality of decisions that can be made to the number of compounds that can be processed with a particular technique [30]. In this latter case, the authors found that: “when searching for rare positives (eg., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/​​​​or unknowable (ie., a 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (eg., tenfold, even 100-fold) changes in models’ brute-force efficiency.” Still, currently the main focus of AI in drug discovery, in many cases, seems to be on speed and cost, as opposed to the quality of decisions.




  15. ⁠, Anders Sandberg, Eric Drexler, Toby Ord (2018-06-06):

    The is the conflict between an of a high ex ante probability of intelligent life elsewhere in the universe and the apparently lifeless universe we in fact observe. The expectation that the universe should be teeming with intelligent life is linked to models like the Drake equation, which suggest that even if the probability of intelligent life developing at a given site is small, the sheer multitude of possible sites should nonetheless yield a large number of potentially observable civilizations. We show that this conflict arises from the use of Drake-like equations, which implicitly assume certainty regarding highly uncertain parameters. We examine these parameters, incorporating models of chemical and genetic transitions on paths to the origin of life, and show that extant scientific knowledge corresponds to uncertainties that span multiple orders of magnitude. This makes a stark difference. When the model is recast to represent realistic distributions of uncertainty, we find a substantial ex ante probability of there being no other intelligent life in our observable universe, and thus that there should be little surprise when we fail to detect any signs of it. This result dissolves the Fermi paradox, and in doing so removes any need to invoke speculative mechanisms by which civilizations would inevitably fail to have observable effects upon the universe.


  17. 2020-kokotajlo.pdf: ⁠, Daniel Kokotajlo, Alexandra Oprea (2020-05-30; existential-risk):

    First, we argue that the appeal of effective altruism (henceforth, EA) depends substantially on a certain empirical premise we call the Heavy Tail Hypothesis (HTH), which characterizes the probability distribution of opportunities for doing good. Roughly, the HTH implies that the best causes, interventions, or charities produce orders of magnitude greater good than the average ones, constituting a substantial portion of the total amount of good caused by altruistic interventions. Next, we canvass arguments EAs have given for the existence of a positive (or “right”) heavy tail and argue that they can also apply in support of a negative (or “left”) heavy tail where counterproductive interventions do orders of magnitude more harm than ineffective or moderately harmful ones. Incorporating the other heavy tail of the distribution has important implications for the core activities of EA: effectiveness research, cause prioritization, and the assessment of altruistic interventions. It also informs the debate surrounding the institutional critique of EA.



  20. ⁠, Andreas Pavlogiannis, Josef Tkadlec, Krishnendu Chatterjee, Martin A. Nowak (2018-06-14):

    [] Because of the intrinsic randomness of the evolutionary process, a mutant with a fitness advantage has some chance to be selected but no certainty. Any experiment that searches for advantageous mutants will lose many of them due to random drift. It is therefore of great interest to find population structures that improve the odds of advantageous mutants. Such structures are called amplifiers of : they increase the probability that advantageous mutants are selected. Arbitrarily strong amplifiers guarantee the selection of advantageous mutants, even for very small fitness advantage. Despite intensive research over the past decade, arbitrarily strong amplifiers have remained rare. Here we show how to construct a large variety of them. Our amplifiers are so simple that they could be useful in biotechnology, when optimizing biological molecules, or as a diagnostic tool, when searching for faster dividing cells or viruses. They could also occur in natural population structures.

    In the evolutionary process, mutation generates new variants, while selection chooses between mutants that have different reproductive rates. Any new mutant is initially present at very low frequency and can easily be eliminated by ⁠. The probability that the lineage of a new mutant eventually takes over the entire population is called the ⁠. It is a key quantity of evolutionary dynamics and characterizes the rate of evolution.

    …In this work we resolve several open questions regarding strong amplification under uniform and temperature initialization. First, we show that there exists a vast variety of graphs with self-loops and weighted edges that are arbitrarily strong amplifiers for both uniform and temperature initialization. Moreover, many of those strong amplifiers are structurally simple, therefore they might be realizable in natural or laboratory setting. Second, we show that both self-loops and weighted edges are key features of strong amplification. Namely, we show that without either self-loops or weighted edges, no graph is a strong amplifier under temperature initialization, and no simple graph is a strong amplifier under uniform initialization.

    …In general, the probability depends not only on the graph, but also on the initial placement of the invading mutants…For a wide class of population structures17, which include symmetric ones28, the fixation probability is the same as for the well-mixed population.

    … A population structure is an arbitrarily strong amplifier (for brevity hereafter also called “strong amplifier”) if it ensures a fixation probability arbitrarily close to one for any advantageous mutant, r > 1. Strong amplifiers can only exist in the limit of large population size.

    Numerical studies30 suggest that for spontaneously arising mutants and small population size, many unweighted graphs amplify for some values of r. But for a large population size, randomly constructed, unweighted graphs do not amplify31. Moreover, proven amplifiers for all values of r are rare. For spontaneously arising mutants (uniform initialization): (1) the Star has fixation probability of ~1 − 1⁄r2 in the limit of large N, and is thus an amplifier17, 32, 33; (2) the Superstar (introduced in ref. 17, see also ref. 34) and the Incubator (introduced in refs. 35, 36), which are graphs with unbounded degree, are strong amplifiers.

    …In this work we resolve several open questions regarding strong amplification under uniform and temperature initialization. First, we show that there exists a vast variety of graphs with self-loops and weighted edges that are arbitrarily strong amplifiers for both uniform and temperature initialization. Moreover, many of those strong amplifiers are structurally simple, therefore they might be realizable in natural or laboratory setting. Second, we show that both self-loops and weighted edges are key features of strong amplification. Namely, we show that without either self-loops or weighted edges, no graph is a strong amplifier under temperature initialization, and no simple graph is a strong amplifier under uniform initialization.

    Figure 1: Evolutionary dynamics in structured populations. Residents (yellow) and mutants (purple) differ in their reproductive rate. (a) A single mutant appears. The lineage of the mutant becomes extinct or reaches fixation. The probability that the mutant takes over the population is called “fixation probability”. (b) The classical, well-mixed population is described by a complete graph with self-loops. (Self-loops are not shown here.) (c) Isothermal structures do not change the fixation probability compared to the well-mixed population. (d) The Star is an amplifier for uniform initialization. (e) A self-loop means the offspring can replace the parent. Self-loops are a mathematical tool to assign different reproduction rates to different places. (f) The Superstar, which has unbounded degree in the limit of large population size, is a strong amplifier for uniform initialization. Its edges (shown as arrows) are directed which means that the connections are one-way.
    Figure 4: Infinite variety of strong amplifiers. Many topologies can be turned into arbitrarily strong amplifiers (Wheel (a), Triangular grid (b), Concentric circles (c), and Tree (d)). Each graph is partitioned into hub (orange) and branches (blue). The weights can be then assigned to the edges so that we obtain arbitrarily strong amplifiers. Thick edges receive large weights, whereas thin edges receive small (or zero) weights

    …Intuitively, the weight assignment creates a sense of global flow in the branches, directed toward the hub. This guarantees that the first 2 steps happen with high probability. For the third step, we show that once the mutants fixate in the hub, they are extremely likely to resist all resident invasion attempts and instead they will invade and take over the branches one by one thereby fixating on the whole graph. For more detailed description, see “Methods” section “Construction of strong amplifiers”.

    Necessary conditions for amplification: Our main result shows that a large variety of population structures can provide strong amplification. A natural follow-up question concerns the features of population structures under which amplification can emerge. We complement our main result by proving that both weights and self-loops are essential for strong amplification. Thus, we establish a strong dichotomy. Without either weights or self-loops, no graph can be a strong amplifier under temperature initialization, and no simple graph can be a strong amplifier under uniform initialization. On the other hand, if we allow both weights and self-loops, strong amplification is ubiquitous.

    …Some naturally occurring population structures could be amplifiers of natural selection. For example, the germinal centers of the immune system might constitute amplifiers for the affinity maturation process of adaptive immunity46. Habitats of animals that are divided into multiple islands with a central breeding location could potentially also act as amplifiers of selection. Our theory helps to identify those structures in natural settings.

  21. Anime#on-development-hell

  22. Embryo-selection#multi-stage-selection