# Leaky Pipelines

Many multi-step processes look like ‘leaky pipelines’, where a fractional loss/​success happens at every step. Such multiplicative processes can often be modeled as a log-normal distribution (or power law), with counterintuitive implications like skewed output distributions and large final differences from small differences in per-step success rates.

2014-11-272022-10-01 in progress
certainty: highly-likely
• The log-normal distribution (quincunx visualization from Li et al 2010) is a skewed distribution which is the multiplicative counterpart to the normal distribution: where the normal distribution is conceptually applicable to where many independent parts are added, the log-normal is where those parts instead multiply. A common example is latent variables multiplying to give a final output; a concrete example would be multiple successive liability threshold-like steps. Also like the normal, the log-normal enjoys many general properties such as limit theorems or preservation under multiplication/​addition.1

• Some researchers are orders of magnitude more prolific and successful than others. Under a normal distribution conceptualization of scientific talent, this would be odd & require them to be many standard deviations beyond the norm on some ‘output’ variable. Shockley suggests that this isn’t so surprising if we imagine scientific research as more of a ‘pipeline’: a scientist has ideas, which feeds into background research, which feeds into a series of experiments, which feeds into writing up papers, then getting them published, then influencing other scientists, then back to getting ideas.

Each step is a different skill, which is plausibly normally-distributed, but each step relies on the output of a previous step: you can’t experiment on non-existent ideas, and you can only publish on that which you experimented on, etc. Few people have an impact by simply having a fabulous idea if they can’t be bothered to write it down. (Consider how much more impact Claude Shannon⁠, Euler, Ramanujan, or Gauss would have had if they had published more than they did.) So if one researcher is merely somewhat better than average at each step, they may wind up having a far larger output of important work than a researcher who is exactly average at each step.

Shockley notes that with 8 variables and an advantage of 50%, the output under a log-normal model would be increased by as much as 25×, eg:

``````simulateLogNormal <- function(advantage, n.variables, iters=100000) {
regular <- 1
advantaged <- replicate(iters, Reduce(`*`, rnorm(n.variables, mean=(1+advantage), sd=1), 1))
return(ma)
}
simulateLogNormal(0.5, 8)
#  25.58716574``````

With more variables, the output difference would be larger still, and is connected to the o-ring theory of productivity⁠. This poses a challenge to those who expect small differences in ability to lead to small output differences, as the log-normal distribution is common in the real world, and also implies that if several stages can be optimized, the remainder will become a severe bottleneck⁠.

• “The Best And The Rest: Revisiting The Norm Of Normality Of Individual Performance”, O’Boyle & Aguinis2012

• Drug Development:

• “Dissolving the Fermi Paradox”, Sandberg et al 2018 (the mean estimate of the Drake equation may be high but the distribution is wide and the median is much smaller than the mean, somewhat akin to Jensen’s inequality⁠/​inequality of arithmetic and geometric means)

• “Prospecting for Gold”, Cotton-Barratt2016; “Counterproductive Altruism: The Other Heavy Tail”, Kokotajlo & Oprea2020

• “Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory”, Pavlogiannis et al 2018

• “Effectiveness is a Conjunction of Multipliers”

• See Also: On Development Hell⁠, Multi-Stage Selection

1. Someone asked if the product of correlated normal variables also yields a log-normal, the way the sum of correlated normals is still normal⁠; checking WP’s “product distribution” page, I suspect not.

Experimenting with random correlation matrices generated by `randcor` to simulate out possible log-normals as `hist(apply(abs(mvrnorm(n=500, mu=rep(0,5), Sigma=randcorr(5))), 1, prod))`, the histograms look far more skewed & peaky to me than a regular log-normal—which is in accord with my intuitions about correlations between variables typically increasing variance and creating more extremes.↩︎