Regression To The Mean Fallacies

Regression to the mean is a general statistical phenomenon which leads to several widespread fallacies in analyzing & interpreting statistical results, such as residual confounding and Lord’s paradox.
bibliography⁠, psychology⁠, statistics⁠, genetics
2021-05-202021-11-13 in progress certainty: possible importance: 5 backlinks / bibliography


causes ⁠, but it also leads to additional errors, particularly when combined with :


  1. This is part of why results in sociology/​​​epidemiology/​​​psychology are so unreliable: everything is correlated but not only do they usually not control for genetics at all, they don’t even control for the things they think they control for! You have not controlled for by throwing in a discretized income variable measured in one year plus a discretized college degree variable. Variables which correlate with or predict some outcome such as poverty, may be doing no more than correcting some measurement error (frequently, due to the heavy genetic loading of most outcomes——correcting the omission of genetic information). This is why within-family designs are desirable even without worries about genetics: they hold constant shared-environment factors so you don’t need to measure or model them. Even a (SEM) which explicitly incorporates measurement error may still have enough leakage to render ‘controlling’ misleading. Such confounding where the highly-imperfect correlations drive pseudo-causal effects (which are just regression to the mean) are doubtless a reason why so many apparently-well-controlled & highly-replicable correlations fail in RCTs⁠.↩︎

  2. There are countless examples of incorrect interpretations of measured variables which are imperfectly correlated with their variables, requiring explicit correction for measurement error or ⁠, particularly when meta-analyzed (see Hunter & Schmidt 2004); for example, it is particularly common for researchers to claim that their favorite new trait (SES, personality, “emotional intelligence”, etc) correlates more with an outcome than IQ, without noting that their sample has been selected to be high on IQ, or that their IQ measure has much more random error in it than their alternative (sometimes with an excuse about “second-order sampling error” like ), and so it is unsurprising yet uninformative if the raw correlation coefficient may be larger because the IQ correlate was biased towards zero much more heavily. This sort of argument-from-attenuated-variables is wrong, but doesn’t become a regression-to-the-mean fallacy until combined with something else.↩︎

  3. The draft version is “Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data”.↩︎

  4. Including but not limited to researcher malpractice; eg the use of “genome-wide statistical-significance” to filter hits ensures a “winner’s curse”, and (contra critics) given their +regression-to-the-mean.↩︎