Why Correlation Usually ≠ Causation

Correlations are oft interpreted as evidence for causation; this is oft falsified; do causal graphs explain why this is so common, because the number of possible indirect paths greatly exceeds the direct paths necessary for useful manipulation?
statistics, philosophy, survey, Bayes, causality, insight-porn
2014-06-242019-12-09 in progress certainty: log importance: 10


It is widely un­der­stood that sta­tis­ti­cal cor­re­la­tion be­tween two vari­ables ≠ cau­sa­tion. De­spite this ad­mo­ni­tion, peo­ple are over­con­fi­dent in claim­ing cor­re­la­tions to sup­port fa­vored causal in­ter­pre­ta­tions and are sur­prised by the re­sults of ran­dom­ized ex­per­i­ments, sug­gest­ing that they are bi­ased & sys­tem­at­i­cally un­der­es­ti­mate the preva­lence of con­founds / com­mon-cau­sa­tion. I spec­u­late that in re­al­is­tic causal net­works or DAGs, the num­ber of pos­si­ble cor­re­la­tions grows faster than the num­ber of pos­si­ble causal re­la­tion­ships. So con­founds re­ally are that com­mon, and since peo­ple do not think in re­al­is­tic DAGs but toy mod­els, the im­bal­ance also ex­plains over­con­fi­dence.

I have no­ticed I seem to be un­usu­ally will­ing to bite the cor­re­la­tion ≠ cau­sa­tion bul­let, and I think it’s due to an idea I had some time ago about the na­ture of re­al­i­ty. It ties into the cu­mu­la­tive ev­i­dence from meta-analy­sis and repli­ca­tion ini­tia­tives and the his­tory of ran­dom­ized ex­per­i­ments about the Repli­ca­tion cri­sis, cor­re­la­tions’ abil­ity to pre­dict ran­dom­ized ex­per­i­ments in prac­tice, be­hav­ioral ge­net­ics re­sults from the genome se­quenc­ing era, and count­less failed so­cial en­gi­neer­ing at­tempts.

Overview: The Current Situation

Here is how I cur­rently un­der­stand the re­la­tion­ship be­tween cor­re­la­tion and causal­i­ty, and the col­lec­tive find­ings of meta-sci­en­tific re­search:

  1. : a shock­ingly large frac­tion of psy­cho­log­i­cal re­search and other fields is sim­ple ran­dom noise which can­not be repli­cat­ed, and due to p-hack­ing, low sta­tis­ti­cal pow­er, pub­li­ca­tion bi­as, and other sources of sys­tem­atic er­ror. The RC can man­u­fac­ture ar­bi­trar­ily many spu­ri­ous re­sults by dat­a­min­ing, and this alone en­sures a high er­ror rate: .

  2. : when we sys­tem­at­i­cally mea­sure many vari­ables at large scale with n large enough to de­feat the Repli­ca­tion Cri­sis prob­lems and es­tab­lish with high con­fi­dence that a given cor­re­la­tion is real and es­ti­mated rea­son­ably ac­cu­rate­ly, we find that ‘every­thing is cor­re­lated’—even things which seem to have no causal re­la­tion­ship what­so­ev­er.

    This is true whether we com­pare ag­gre­gate en­vi­ron­men­tal data over many peo­ple, phe­no­type data about in­di­vid­u­als, or ge­netic da­ta. They’re all cor­re­lat­ed. Every­thing is cor­re­lat­ed. If you fail to re­ject the null hy­poth­e­sis with p < 0.05, you sim­ply haven’t col­lected enough data yet. So, as Meehl asked, what does a cor­re­la­tion be­tween 2 vari­ables mean if we know per­fectly well in ad­vance that it will ei­ther be pos­i­tive or neg­a­tive, but we can’t pre­dict which or how large it is?

  3. : em­pir­i­cal­ly, most efforts to change hu­man be­hav­ior and so­ci­ol­ogy and eco­nom­ics and ed­u­ca­tion fail in ran­dom­ized eval­u­a­tion and the mean effect size of ex­per­i­ments in meta-analy­ses typ­i­cally ap­proaches ze­ro, de­spite promis­ing cor­re­la­tions.

  4. : so, we live in a world where re­search man­u­fac­tures many spu­ri­ous re­sults and, even once we see through the fake find­ings, find­ing a cor­re­la­tion is mean­ing­less be­cause every­thing is cor­re­lated to be­gin with and ac­cord­ing­ly, they are lit­tle bet­ter than ex­per­i­ment­ing at ran­dom, which does­n’t work well ei­ther.

    Thus, un­sur­pris­ing­ly, in every field from med­i­cine to eco­nom­ics, when we di­rectly ask how well cor­re­la­tions pre­dict sub­se­quent ran­dom­ized ex­per­i­ments, we find that the pre­dic­tive power is poor. De­spite the best efforts of all in­volved re­searchers, an­i­mal ex­per­i­ments, nat­ural ex­per­i­ments, hu­man ex­pert judg­ment, our un­der­stand­ing of the rel­e­vant causal net­works etc, the high­est qual­ity cor­re­la­tional re­search still strug­gles to out­per­form a coin flip in pre­dict­ing the re­sults of a ran­dom­ized ex­per­i­ment. This re­flects both the con­t­a­m­i­nat­ing spu­ri­ous cor­re­la­tions from the Repli­ca­tion Cri­sis, but also the brute fact that in prac­tice, cor­re­la­tions don’t pro­vide good guides to causal in­ter­ven­tions.

    But why is cor­re­la­tion ≠ cau­sa­tion?

  5. Dense Causal Graphs: be­cause, if we write down a causal graph con­sis­tent with ‘every­thing is cor­re­lated’ and the em­pir­i­cal facts of av­er­age null effects + un­pre­dic­tive cor­re­la­tions, this im­plies that all vari­ables are part of enor­mous dense causal graphs where each vari­able is con­nected to sev­eral oth­ers.

    And in such a graph, the num­ber of ways in which a vari­able’s value is con­nected to an­other vari­able and pro­vides in­for­ma­tion about it (ie cor­re­lat­ed) grows ex­tremely rapid­ly, while it only pro­vides a use­ful causal ma­nip­u­la­tion if the other vari­able is solely con­nected by a nar­row spe­cific se­quence of one-way causal ar­rows; there are vastly more in­di­rect con­nec­tions than di­rect con­nec­tions, so any given in­di­rect con­nec­tion is vastly un­likely to be a di­rect con­nec­tion, and thus ma­nip­u­lat­ing one vari­able typ­i­cally will not affect the other vari­able.

  6. In­cor­rect In­tu­itions: This in­equal­ity be­tween ob­serv­able cor­re­la­tions and ac­tual use­ful causal ma­nip­u­la­bil­ity merely grows with larger net­works, and causal net­works in fields like eco­nom­ics or bi­ol­ogy are far more com­plex than those in more or­di­nary every­day fields like ‘catch­ing a ball’.

    Our in­tu­itions, formed in sim­ple do­mains de­signed to have sparse causal net­works (it would be bad if balls could make you do ran­dom things! your brain is care­fully de­signed to con­trol the in­flu­ence of any out­side forces & model the world as sim­ple for plan­ning pur­pos­es), turn out to be pro­foundly mis­lead­ing in these other do­mains.

    Things like vi­sion are pro­foundly com­plex, but our brains present to us sim­ple processed il­lu­sions to as­sist de­ci­sion-mak­ing; we have ‘folk psy­chol­ogy’ and ‘folk physics’, which are sim­ple, use­ful, and dead­-wrong as full de­scrip­tions of re­al­i­ty. Even physics stu­dents trained in New­ton­ian me­chan­ics can­not eas­ily over­ride their Aris­totelian ‘folk physics’ in­tu­itions & cor­rectly pre­dict the move­ments of ob­jects in or­bit; it is un­sur­pris­ing that ‘folk causal­ity’ often per­forms bad­ly, es­pe­cially in ex­tremely com­plex fields with am­bigu­ous long-term out­comes on novel in­her­ent­ly-d­iffi­cult prob­lems where things like pos­i­tively con­spire to fool hu­man heuris­tics which are quite adap­tive in every­day life—­like med­ical re­search where a re­searcher is for­tu­nate if, dur­ing their en­tire ca­reer, they can for­mu­late and then defi­nitely prove or re­fute even a sin­gle ma­jor hy­poth­e­sis.

  7. No, Re­al­ly, Cor­re­la­tion ≠ Cau­sa­tion: This cog­ni­tive bias is why cor­re­la­tion ≠ cau­sa­tion is so diffi­cult to in­ter­nal­ize and ac­cept, and hon­ored pri­mar­ily in the breach even by so­phis­ti­cated re­searchers, and is why ran­dom­ized ex­per­i­ments are his­tor­i­cally late de­vel­oped, ne­glect­ed, coun­ter­in­tu­itive, and crit­i­cized when run de­spite rou­tinely de­bunk­ing con­ven­tional wis­dom of ex­perts in al­most every field.

    Be­cause of this, we must treat cor­re­la­tional claims and pol­icy rec­om­men­da­tions based on even more skep­ti­cally than we find in­tu­itive, and it is un­eth­i­cal to make im­por­tant de­ci­sions based on such weak ev­i­dence. The ques­tion should never be, “is it eth­i­cal to run a ran­dom­ized ex­per­i­ment here?” but “could it pos­si­bly be eth­i­cal to not run a ran­dom­ized ex­per­i­ment here?”

    What can be done about this, be­sides re­peat­ing ad nau­seam that ‘cor­re­la­tion ≠ cau­sa­tion’? How can we re­place wrong in­tu­itions with right ones so peo­ple feel that? Do we need to em­pha­size the many ex­am­ples, like med­ical re­ver­sals of stents or back surg­eries or blood tran­fu­sions? Do we need to ? Do we need in­ter­ac­tive vi­su­al­iza­tions—“Sim­C­ity for causal graphs”?—to make it in­tu­itive?

Confound it! Correlation is (usually) not causation! But why not?

The Problem

“Hubris is the great­est dan­ger that ac­com­pa­nies for­mal data analy­sis…Let me lay down a few ba­sics, none of which is easy for all to ac­cept… 1. The data may not con­tain the an­swer. The com­bi­na­tion of some data and an aching de­sire for an an­swer does not en­sure that a rea­son­able an­swer can be ex­tracted from a given body of da­ta.”

(pg74–75, “Sun­set Salvo” 1986)

“Every time I write about the im­pos­si­bil­ity of effec­tively pro­tect­ing dig­i­tal files on a gen­er­al-pur­pose com­put­er, I get re­sponses from peo­ple de­cry­ing the death of copy­right.”How will au­thors and artists get paid for their work?" they ask me. Truth be told, I don’t know. I feel rather like the physi­cist who just ex­plained rel­a­tiv­ity to a group of would-be in­ter­stel­lar trav­el­ers, only to be asked: “How do you ex­pect us to get to the stars, then?” I’m sor­ry, but I don’t know that, ei­ther."

Bruce Schneier, “Pro­tect­ing Copy­right in the Dig­i­tal World”, 2001

Most sci­en­tifi­cal­ly-in­clined peo­ple are rea­son­ably aware that one of the ma­jor di­vides in re­search is that : that hav­ing dis­cov­ered some re­la­tion­ship be­tween var­i­ous data X and Y (not nec­es­sar­ily Pear­son’s r, but any sort of math­e­mat­i­cal or sta­tis­ti­cal re­la­tion­ship, whether it be a hum­ble r or an opaque deep neural net­work’s pre­dic­tion­s), we do not know how Y would change if we ma­nip­u­lated X. Y might in­crease, de­crease, do some­thing com­pli­cat­ed, or re­main im­placa­bly the same.

Correlations Often Aren’t

This might be be­cause the cor­re­la­tion is not a real one, and is spu­ri­ous, in the sense that it would dis­ap­pear if we gath­ered more data, and was an due to bi­as­es; or it could be an ar­ti­fact of our math­e­mat­i­cal pro­ce­dures as in “s”; or it is a Type I er­ror, a cor­re­la­tion thrown up by the stan­dard sta­tis­ti­cal prob­lems we all know about, such as too-s­mall n, false pos­i­tives from sam­pling er­ror (A & B just hap­pened to sync to­gether due to ran­dom­ness), data-mining/multiple test­ing, p-hack­ing, data snoop­ing, se­lec­tion bi­as, pub­li­ca­tion bi­as, mis­con­duct, in­ap­pro­pri­ate sta­tis­ti­cal tests, etc. has se­ri­ously shaken faith in the pub­lished re­search lit­er­a­ture in many fields, and it’s clear that many cor­re­la­tions are over-es­ti­mated in strength by sev­er­al­fold, or the sign is in fact the op­po­site di­rec­tion.

Correlation Often Isn’t Causation

I’ve read about those prob­lems at length, and de­spite know­ing about all that, there still seems to be a prob­lem: I don’t think those is­sues ex­plain away all the cor­re­la­tions which turn out to be con­found­s—­cor­re­la­tion too often ≠ cau­sa­tion.

But let’s say we get past that and we have es­tab­lished be­yond a rea­son­able doubt that some X and Y re­ally do cor­re­late. We still have not solved the prob­lem.

A priori

This point can be made by list­ing ex­am­ples of cor­re­la­tions where we in­tu­itively know chang­ing X should have no effect on Y, and it’s a : the num­ber of churches in a town may cor­re­late with the num­ber of bars, but we know that’s be­cause both are re­lated to how many peo­ple are in it; the num­ber of pi­rates may (but we know pi­rates don’t con­trol global warm­ing and it’s more likely some­thing like eco­nomic de­vel­op­ment leads to sup­pres­sion of piracy but also CO2 emis­sion­s); sales of ice cream may cor­re­late with snake bites or vi­o­lent crime or death from heat-strokes (but of course snakes don’t care about sab­o­tag­ing ice cream sales); thin peo­ple may have bet­ter pos­ture than fat peo­ple, but sit­ting up­right does not seem like a plau­si­ble weight loss plan1; wear­ing XXXL cloth­ing clearly does­n’t cause heart at­tacks, al­though one might won­der if diet soda causes obe­si­ty; black skin does not cause sickle cell ane­mia nor, to bor­row an ex­am­ple from Pear­son2, would black skin cause small­pox or malar­ia; more re­cent­ly, part of the psy­chol­ogy be­hind is that many vac­cines are ad­min­is­tered to chil­dren at the same time autism would start be­com­ing ap­par­ent… In these cas­es, we can see what the cor­re­la­tion, which is surely true (in the sense that we can go out and ob­serve it any time we like), does­n’t work like we might think: there is some third vari­able which causes both X and Y, or it turns out we’ve got­ten it back­wards.

In fact, if we go out and look at large datasets, we will find that two vari­ables be­ing cor­re­lated is noth­ing spe­cial—be­cause . As Paul Meehl noted, the cor­re­la­tions can seem com­pletely ar­bi­trary, yet are firmly es­tab­lished by ex­tremely large n (eg n = 57,000 & n = 50,000 in his 2 ex­am­ples).

A posteriori

We can also es­tab­lish this by sim­ply look­ing at the re­sults of ran­dom­ized ex­per­i­ments which take a cor­re­la­tion and nail down what a ma­nip­u­la­tion does.

To mea­sure this di­rectly you need a clear set of cor­re­la­tions which are pro­posed to be causal, ran­dom­ized ex­per­i­ments to es­tab­lish what the true causal re­la­tion­ship is in each case, and both cat­e­gories need to be sharply de­lin­eated in ad­vance to avoid is­sues of cher­ryp­ick­ing and retroac­tively con­firm­ing a cor­re­la­tion. Then you’d be able to say some­thing like ‘11 out of the 100 pro­posed A → B causal re­la­tion­ships panned out’, and start with a prior of 11% that in your case, A → B.

This sort of dataset is pretty rare, al­though tend to in­di­cate that our prior should be low. (For ex­am­ple, an­a­lyze a gov­ern­ment jobs pro­gram and got data on ran­dom­ized par­tic­i­pants & oth­ers, per­mit­ting com­par­i­son of ran­dom­ized in­fer­ence to stan­dard re­gres­sion ap­proach­es; they find roughly that 0⁄12 es­ti­mates—­many sta­tis­ti­cal­ly-sig­nifi­can­t—were rea­son­ably sim­i­lar to the causal effect for one job pro­gram & 4⁄12 for an­other job pro­gram, with the re­gres­sion es­ti­mates for the for­mer heav­ily bi­ased.) Not great. Why are our best analy­ses & guesses at causal re­la­tion­ships so bad?

Shouldn’t It Be Easy?

We’d ex­pect that the a pri­ori odds are good, by the : 1⁄3! After all, you can divvy up the pos­si­bil­i­ties as:

  1. A causes B (A → B)

  2. B causes A (A ← B)

  3. both A and B are caused by C (A ← C → B)

    (Pos­si­bly in a com­plex way like or con­di­tion­ing on un­men­tioned vari­ables, like a phone-based sur­vey in­ad­ver­tently gen­er­at­ing con­clu­sions valid only for the , caus­ing amus­ing pseudo-cor­re­la­tions.)

If it’s ei­ther #1 or #2, we’re good and we’ve found a causal re­la­tion­ship; it’s only out­come #3 which leaves us baffled & frus­trat­ed. If we were guess­ing at ran­dom, you’d ex­pect us to still be right at least 33% of the time. (As in the joke about the lot­tery—y­ou’ll ei­ther win or lose, and you don’t know which, so it’s 50-50, and you like dem odd­s!) And we can draw on all sorts of knowl­edge to do bet­ter3

I think a lot of peo­ple tend to put a lot of weight on any ob­served cor­re­la­tion be­cause of this in­tu­ition that a causal re­la­tion­ship is nor­mal & prob­a­ble be­cause, well, “how else could this cor­re­la­tion hap­pen if there’s no causal con­nec­tion be­tween A & B‽” And fair enough—there’s no grand cos­mic con­spir­acy ar­rang­ing mat­ters to fool us by al­ways putting in place a C fac­tor to cause sce­nario #3, right? If you ques­tion peo­ple, of course they know cor­re­la­tion does­n’t nec­es­sar­ily mean cau­sa­tion—ev­ery­one knows that—s­ince there’s al­ways a chance of a lurk­ing con­found, and it would be great if you had a ran­dom­ized ex­per­i­ment to draw on; but you think with the data you have, not the data you wish you had, and can’t let the per­fect be the en­emy of the bet­ter. So when some­one finds a cor­re­la­tion be­tween A and B, it’s no sur­prise that sud­denly their lan­guage & at­ti­tude change and they seem to place great con­fi­dence in their fa­vored causal re­la­tion­ship even if they pi­ously ac­knowl­edge “Yes, cor­re­la­tion is not cau­sa­tion, but… ob­vi­ously hang­ing out with fat peo­ple will make you fat / par­ents eg. smok­ing en­cour­ages their kids to smoke / when we gave ba­bies a new drug, fewer went blind / fe­male-named hur­ri­canes in­crease death tolls due to sex­istly un­der­es­ti­mat­ing women / cor­re­lates so highly with AIDS that it must be an­other con­se­quence of HIV (ac­tu­ally caused by HHV-8 which is trans­mit­ted si­mul­ta­ne­ously with HIV) / vi­t­a­min and an­ti-ox­i­dant use (a­mong many other ) will save lives / & as­so­ciates with and thus surely causes schiz­o­phre­nia and other forms of in­san­ity (de­spite in­creases in mar­i­juana use not fol­lowed by any schiz­o­phre­nia in­creases / cor­re­lates with mor­tal­ity re­duc­tion in women so it defi­nitely helps and does­n’t ” etc.

Be­sides the in­tu­itive­ness of cor­re­la­tion=­cau­sa­tion, we are also des­per­ate and want to be­lieve: cor­rel­a­tive data is so rich and so plen­ti­ful, and ex­per­i­men­tal data so rare. If it is not usu­ally the case that cor­re­la­tion=­cau­sa­tion, then what ex­actly are we go­ing to do for de­ci­sions and be­liefs, and what ex­actly have we spent all our time to ob­tain? When I look at some dataset with a num­ber of vari­ables and I run a mul­ti­ple re­gres­sion and can re­port that vari­ables A, B, and C are all sta­tis­ti­cal­ly-sig­nifi­cant and of large effec­t-size when re­gressed on D, all I have re­ally done is learned some­thing along the lines of “in a hy­po­thet­i­cal dataset gen­er­ated in the ex­act same way, if I some­how was lack­ing data on D, I could make a bet­ter pre­dic­tion in a nar­row math­e­mat­i­cal sense of no im­por­tance (squared er­ror) based on A/B/C”. I have not learned whether A/B/C cause D, or whether I could pre­dict val­ues of D in the fu­ture, or any­thing about how I could in­ter­vene and ma­nip­u­late any of A-D, or any­thing like that—rather, I have learned a small point about pre­dic­tion. To take a real ex­am­ple: when I learn that mod­er­ate al­co­hol con­sump­tion means the ac­tu­ar­ial pre­dic­tion of lifes­pan for drinkers should be in­creased slight­ly, why on earth would I care about this at all un­less it was causal? When epi­demi­ol­o­gists emerge from a huge sur­vey re­port­ing tri­umphantly that steaks but not egg con­sump­tion slightly pre­dicts de­creased lifes­pan, why would any­one care aside from per­haps life in­sur­ance com­pa­nies? Have you ever been ab­ducted by space aliens and or­dered as part of an in­scrutable alien blood­-s­port to take a set of data about Mid­west Amer­i­cans born 1960–1969 with di­etary pre­dic­tors you must com­bine lin­early to cre­ate pre­dic­tors of heart at­tacks un­der a squared er­ror loss func­tion to out­pre­dict your fel­low ab­ductees from across the galaxy? Prob­a­bly not. Why would any­one give them grant money for this, why would they spend their time on this, why would they read each oth­ers’ pa­pers un­less they had a “qua­si­-re­li­gious faith”4 that these cor­re­la­tions were more than just some co­effi­cients in a pre­dic­tive mod­el—that they were causal? To quote Rut­ter 2007, most dis­cus­sions of cor­re­la­tions fall into two equally prob­lem­atic camps:

…all be­hav­ioral sci­en­tists are taught that sta­tis­ti­cally sig­nifi­cant cor­re­la­tions do not nec­es­sar­ily mean any kind of causative effect. Nev­er­the­less, the lit­er­a­ture is full of stud­ies with find­ings that are ex­clu­sively based on cor­re­la­tional ev­i­dence. Re­searchers tend to fall into one of two camps with re­spect to how they re­act to the prob­lem.

  1. First, there are those who are care­ful to use lan­guage that avoids any di­rect claim for cau­sa­tion, and yet, in the dis­cus­sion sec­tion of their pa­pers, they im­ply that the find­ings do in­deed mean cau­sa­tion.
  2. Sec­ond, there are those that com­pletely ac­cept the in­abil­ity to make a causal in­fer­ence on the ba­sis of sim­ple cor­re­la­tion or as­so­ci­a­tion and, in­stead, take refuge in the claim that they are study­ing only as­so­ci­a­tions and not cau­sa­tion. This sec­ond, “pure” ap­proach sounds safer, but it is disin­gen­u­ous be­cause it is diffi­cult to see why any­one would be in­ter­ested in sta­tis­ti­cal as­so­ci­a­tions or cor­re­la­tions if the find­ings were not in some way rel­e­vant to an un­der­stand­ing of causative mech­a­nisms.

So, cor­re­la­tions tend to not be cau­sa­tion be­cause it’s al­most al­ways #3, a shared cause. This com­mon­ness is con­trary to our ex­pec­ta­tions, based on a sim­ple & un­ob­jec­tion­able ob­ser­va­tion that of the 3 pos­si­ble re­la­tion­ships, 2 are causal; and so we often rea­son as though cor­re­la­tion were strong ev­i­dence for cau­sa­tion. This leaves us with a para­dox: ex­per­i­men­tal re­sults seem to con­tra­dict in­tu­ition. To re­solve the para­dox, I need to offer a clear ac­count of why shared causes/confounds are so com­mon, and hope­fully mo­ti­vate a differ­ent set of in­tu­itions.

What a Tangled Net We Weave When First We Practice to Believe

“…we think so much re­ver­sal is based on ‘We think some­thing should work, and so we’re go­ing to adopt it be­fore we know that it ac­tu­ally does work,’ and one of the rea­sons for this is be­cause that’s how med­ical ed­u­ca­tion is struc­tured. We learn the bio­chem­istry, the phys­i­ol­o­gy, the patho­phys­i­ol­ogy as the very first things in med­ical school. And over the first two years we kind of get con­vinced that every­thing works mech­a­nis­ti­cally the way we think it does.”

Adam Cifu5

Here’s where Bayes nets & (seen pre­vi­ously on LW & Michael Nielsen) come up. Even sim­u­lat­ing the sim­plest pos­si­ble model of lin­ear re­gres­sion, adding co­vari­ates barely in­crease the prob­a­bil­ity of cor­rectly in­fer­ring di­rec­tion of causal­i­ty, and the effect sizes re­main badly im­pre­cise (Walker 2014). And when net­works are in­ferred on re­al-world data, they look gnarly: tons of nodes, tons of ar­rows point­ing all over the place. early on in her Prob­a­bilis­tic Graph­i­cal Mod­els course shows an ex­am­ple from a med­ical set­ting where the net­work has like 600 nodes and you can’t un­der­stand it at all. When you look at a bi­o­log­i­cal causal net­work like me­tab­o­lism:

“A Toolkit Sup­port­ing For­mal Rea­son­ing about Causal­ity in Meta­bolic Net­works”

You start to ap­pre­ci­ate how every­thing might be cor­re­lated with every­thing, but (usu­al­ly) not cause each oth­er.

This is not too sur­pris­ing if you step back and think about it: life is com­pli­cat­ed, we have lim­ited re­sources, and every­thing has a lot of mov­ing parts. (How many dis­crete parts does an air­plane have? Or your car? Or a sin­gle cell? Or think about a chess player an­a­lyz­ing a po­si­tion: ‘if my bishop goes there, then the other pawn can go here, which opens up a move there or here, but of course, they could also do that or try an en pas­sant in which case I’ll be down in ma­te­r­ial but up on ini­tia­tive in the cen­ter, which causes an over­all shift in tem­po…’) For­tu­nate­ly, these net­works are still sim­ple com­pared to what they could be, since most nodes aren’t di­rectly con­nected to each oth­er, which tamps down on the com­bi­na­to­r­ial ex­plo­sion of pos­si­ble net­works. (How many differ­ent causal net­works are pos­si­ble if you have 600 nodes to play with? The ex­act an­swer is com­pli­cated but it’s much larger than 2600—so very large!)

One in­ter­est­ing thing I man­aged to learn from PGM (be­fore con­clud­ing it was too hard for me and I should try it lat­er) was that in a Bayes net even if two nodes were not in a sim­ple di­rect cor­re­la­tion re­la­tion­ship A → B, you could still learn a lot about A from set­ting B to a val­ue, even if the two nodes were ‘way across the net­work’ from each oth­er. You could trace the in­flu­ence flow­ing up and down the path­ways to some sur­pris­ingly dis­tant places if there weren’t any block­ers.

The big­ger the net­work, the more pos­si­ble com­bi­na­tions of nodes to look for a pair­wise cor­re­la­tion be­tween them (eg If there are 10 nodes/variables and you are look­ing at bi­vari­ate cor­re­la­tions, then you have 10 choose 2 = 45 pos­si­ble com­par­isons, and with 20, 190, and 40, 780. 40 vari­ables is not that much for many re­al-world prob­lem­s.) A lot of these com­bos will yield some sort of cor­re­la­tion. But does the num­ber of causal re­la­tion­ships go up as fast? I don’t think so (although I can’t prove it).

If not, then as causal net­works get big­ger, the num­ber of gen­uine cor­re­la­tions will ex­plode but the num­ber of gen­uine causal re­la­tion­ships will in­crease slow­er, and so the frac­tion of cor­re­la­tions which are also causal will col­lapse.

(Or more con­crete­ly: sup­pose you gen­er­ated a ran­domly con­nected causal net­work with x nodes and y ar­rows per­haps us­ing the al­go­rithm in , where each ar­row has some ran­dom noise in it; count how many pairs of nodes are in a causal re­la­tion­ship; now, n times ini­tial­ize the root nodes to ran­dom val­ues and gen­er­ate a pos­si­ble state of the net­work & stor­ing the val­ues for each node; count how many pair­wise cor­re­la­tions there are be­tween all the nodes us­ing the n sam­ples (us­ing an ap­pro­pri­ate sig­nifi­cance test & al­pha if one wants); di­vide # of causal re­la­tion­ships by # of cor­re­la­tions, store; re­turn to the be­gin­ning and re­sume with x+1 nodes and y+1 ar­rows… As one graphs each value of x against its re­spec­tive es­ti­mated frac­tion, does the frac­tion head to­ward 0 as x in­creas­es? My the­sis is it does. Or, since there must be at least as many causal re­la­tion­ships in a graph as there are ar­rows, you could sim­ply use that as an up­per bound on the frac­tion.)

It turns out, we weren’t sup­posed to be rea­son­ing ‘there are 3 cat­e­gories of pos­si­ble re­la­tion­ships, so we start with 33%’, but rather: ‘there is only one ex­pla­na­tion “A causes B”, only one ex­pla­na­tion “B causes A”, but there are many ex­pla­na­tions of the form “C1 causes A and B”, “C2 causes A and B”, “C3 causes A and B”…’, and the more nodes in a field’s true causal net­works (psy­chol­ogy or bi­ol­ogy vs physics, say), the big­ger this last cat­e­gory will be.

The real world is the largest of causal net­works, so it is un­sur­pris­ing that most cor­re­la­tions are not causal, even after we clamp down our data col­lec­tion to nar­row do­mains. Hence, our prior for “A causes B” is not 50% (it’s ei­ther true or false) nor is it 33% (ei­ther A causes B, B causes A, or mu­tual cause C) but some­thing much small­er: the num­ber of causal re­la­tion­ships di­vided by the num­ber of pair­wise cor­re­la­tions for a graph, which ra­tio can be roughly es­ti­mated on a field­-by-field ba­sis by look­ing at ex­ist­ing work or di­rectly for a par­tic­u­lar prob­lem (per­haps one could de­rive the frac­tion based on the prop­er­ties of the small­est in­ferrable graph that fits large datasets in that field). And since the larger a cor­re­la­tion rel­a­tive to the usual cor­re­la­tions for a field, the more likely the two nodes are to be close in the causal net­work and hence more likely to be joined causal­ly, one could even give causal­ity es­ti­mates based on the size of a cor­re­la­tion (eg. an r = 0.9 leaves less room for con­found­ing than an r of 0.1, but how much will de­pend on the causal net­work).

This is ex­actly what we see. How do you treat can­cer? Thou­sands of treat­ments get tried be­fore one works. How do you deal with pover­ty? Most pro­grams are not even wrong. Or how do you fix so­ci­etal woes in gen­er­al? Most at­tempts fail mis­er­ably and the high­er-qual­ity your stud­ies, the worse at­tempts look (lead­ing to ). This even ex­plains why and An­drew Gel­man’s dic­tum about how co­effi­cients are never ze­ro: the rea­son large datasets find most of their vari­ables to have non-zero cor­re­la­tions (often reach­ing sta­tis­ti­cal-sig­nifi­cance) is be­cause the data is be­ing drawn from large com­pli­cated causal net­works in which al­most every­thing re­ally is cor­re­lated with every­thing else.

And thus I was en­light­ened.

Comment

Since I know so lit­tle about causal mod­el­ing, I asked our lo­cal causal re­searcher Ilya Sh­pitser to maybe leave a com­ment about whether the above was triv­ially wrong / al­ready-proven / well-known folk­lore / etc; for con­ve­nience, I’ll ex­cerpt the core of his com­ment:

But does the num­ber of causal re­la­tion­ships go up just as fast? I don’t think so (although at the mo­ment I can’t prove it).

I am not sure ex­actly what you mean, but I can think of a for­mal­iza­tion where this is not hard to show. We say A “struc­turally causes” B in a DAG G if and only if there is a di­rected path from A to B in G. We say A is “struc­turally de­pen­dent” with B in a DAG G if and only if there is a mar­ginal d-con­nect­ing path from A to B in G.

A mar­ginal d-con­nect­ing path be­tween two nodes is a path with no con­sec­u­tive edges of the form * → * ← * (that is, no col­lid­ers on the path). In other words all di­rected paths are mar­ginal d-con­nect­ing but the op­po­site is­n’t true.

The jus­ti­fi­ca­tion for this de­fi­n­i­tion is that if A “struc­turally causes” B in a DAG G, then if we were to in­ter­vene on A, we would ob­serve B change (but not vice ver­sa) in “most” dis­tri­b­u­tions that arise from causal struc­tures con­sis­tent with G. Sim­i­lar­ly, if A and B are “struc­turally de­pen­dent” in a DAG G, then in “most” dis­tri­b­u­tions con­sis­tent with G, A and B would be mar­gin­ally de­pen­dent (e.g. what you prob­a­bly mean when you say ‘cor­re­la­tions are there’).

I qual­ify with “most” be­cause we can­not si­mul­ta­ne­ously rep­re­sent de­pen­dences and in­de­pen­dences by a graph, so we have to choose. Peo­ple have cho­sen to rep­re­sent in­de­pen­dences. That is, if in a DAG G some ar­row is miss­ing, then in any dis­tri­b­u­tion (causal struc­ture) con­sis­tent with G, there is some sort of in­de­pen­dence (miss­ing effec­t). But if the ar­row is not miss­ing we can­not say any­thing. Maybe there is de­pen­dence, maybe there is in­de­pen­dence. An ar­row may be present in G, and there may still be in­de­pen­dence in a dis­tri­b­u­tion con­sis­tent with G. We call such dis­tri­b­u­tions “un­faith­ful” to G. If we pick dis­tri­b­u­tions con­sis­tent with G ran­dom­ly, we are un­likely to hit on un­faith­ful ones (sub­set of all dis­tri­b­u­tions con­sis­tent with G that is un­faith­ful to G has mea­sure ze­ro), but Na­ture does not pick ran­dom­ly.. so un­faith­ful dis­tri­b­u­tions are a wor­ry. They may arise for sys­tem­atic rea­sons (maybe equi­lib­rium of a feed­back process in bio?)

If you ac­cept above de­fi­n­i­tion, then clearly for a DAG with n ver­tices, the num­ber of pair­wise struc­tural de­pen­dence re­la­tion­ships is an up­per bound on the num­ber of pair­wise struc­tural causal re­la­tion­ships. I am not aware of any­one hav­ing worked out the ex­act com­bi­na­torics here, but it’s clear there are many many more paths for struc­tural de­pen­dence than paths for struc­tural causal­i­ty.


But what you ac­tu­ally want is not a DAG with n ver­tices, but an­other type of graph with n ver­tices. The “Uni­verse DAG” has a lot of ver­tices, but what we ac­tu­ally ob­serve is a very small sub­set of these ver­tices, and we mar­gin­al­ize over the rest. The trou­ble is, if you start with a dis­tri­b­u­tion that is con­sis­tent with a DAG, and you mar­gin­al­ize over some things, you may end up with a dis­tri­b­u­tion that is­n’t well rep­re­sented by a DAG. Or “DAG mod­els aren’t closed un­der mar­gin­al­iza­tion.”

That is, if our DAG is A → B ← H → C ← D, and we mar­gin­al­ize over H be­cause we do not ob­serve H, what we get is a dis­tri­b­u­tion where no DAG can prop­erly rep­re­sent all con­di­tional in­de­pen­dences. We need an­other kind of graph.

In fact, peo­ple have come up with a mixed graph (con­tain­ing → ar­rows and ⟺ ar­rows) to rep­re­sent mar­gins of DAGs. Here → means the same as in a causal DAG, but ⟺ means “there is some sort of com­mon cause/confounder that we don’t want to ex­plic­itly write down.” Note: ⟺ is not a cor­rel­a­tive ar­row, it is still en­cod­ing some­thing causal (the pres­ence of a hid­den com­mon cause or caus­es). I am be­ing loose here—in fact it is the ab­sence of ar­rows that means things, not the pres­ence.

I do a lot of work on these kinds of graphs, be­cause these are graphs are the sen­si­ble rep­re­sen­ta­tion of data we typ­i­cally get—­drawn from a mar­ginal of a joint dis­tri­b­u­tion con­sis­tent with a big un­known DAG.

But the com­bi­na­torics work out the same in these graph­s—the num­ber of mar­ginal d-con­nected paths is much big­ger than the num­ber of di­rected paths. This is prob­a­bly the source of your in­tu­ition. Of course what often hap­pens is you do have a (weak) causal link be­tween A and B, but a much stronger non-causal link be­tween A and B through an un­ob­served com­mon par­ent. So the causal link is hard to find with­out “tricks.”

What is to be done?

Shouting 2+2=4 From the Rooftops

“There is noth­ing that plays worse in our cul­ture than seem­ing to be the stodgy de­fender of old ideas, no mat­ter how true those ideas may be. Luck­i­ly, at this point the or­tho­doxy of the aca­d­e­mic econ­o­mists is very much a mi­nor­ity po­si­tion among in­tel­lec­tu­als in gen­er­al; one can seem to be a coura­geous mav­er­ick, boldly chal­leng­ing the pow­ers that be, by recit­ing the con­tents of a stan­dard text­book. It has worked for me!”

, “Ri­car­do’s Diffi­cult Idea”

To go on a tan­gent: why is it so im­por­tant that we ham­mer this in?

The un­re­li­a­bil­ity is bad enough, but I’m also wor­ried that the knowl­edge cor­re­la­tion ≠ cau­sa­tion, one of the core ideas of the sci­en­tific method and fun­da­men­tal to fields like mod­ern med­i­cine, is go­ing un­der­ap­pre­ci­ated and is be­ing aban­doned by meta-con­trar­i­ans due to its in­con­ve­nience. Point­ing it out is “noth­ing help­ful” or “mean­ing­less”, and jus­ti­fied skep­ti­cism is ac­tu­ally just “a dum­b­-ass thing to say”, a “sta­tis­ti­cal cliché that closes threads and ends de­bates, the fresh­man plat­i­tude turned fi­nal shut­down” often used by “party poop­ers” “In­ter­net blowhards” to serve an “agenda” & is some­times “a dog whis­tle”; in prac­tice, such peo­ple seem to go well be­yond the XKCD comic and pro­ceed to take any cor­re­la­tions they like as strong ev­i­dence for cau­sa­tion, and any dis­agree­ment is an un­so­phis­ti­cated mid­dle­brow dis­missal & de­nial­ism.

In­sist­ing cor­re­la­tion ≅ cau­sa­tion (, “Bilge” (2013))

So it’s un­sur­pris­ing that one so often runs into re­searchers for whom in­deed cor­re­la­tion=­cau­sa­tion (we cer­tainly would­n’t want to be fresh­men or In­ter­net blowhards, would we?). It is com­mon to use causal lan­guage and make rec­om­men­da­tions (Prasad et al 2013), but even if they don’t, you can be sure to see them con­fi­dently talk­ing causally to other re­searchers or jour­nal­ists or offi­cials. (I’ve no­ticed this sort of con­stant mot­te-and-bai­ley slide from vague men­tions of how re­sults are cor­rel­a­tive tucked away at the end of the pa­per to freely dis­pens­ing ad­vice for pol­i­cy­mak­ers about how their re­search proves X should be im­ple­mented is par­tic­u­larly com­mon in med­i­cine, so­ci­ol­o­gy, and ed­u­ca­tion.)

Bandy­ing phrases with meta-con­trar­i­ans won’t help much here; I agree with them that cor­re­la­tion ought to be some ev­i­dence for cau­sa­tion. eg if I sus­pect that A → B, and I col­lect data and es­tab­lish be­yond doubt that A&B cor­re­lates r = 0.7, surely this ob­ser­va­tions, which is con­sis­tent with my the­o­ry, should boost my con­fi­dence in my the­o­ry, just as an ob­ser­va­tion like r = 0.0001 would trou­ble me great­ly. But how much…?

As it is, it seems we fall read­ily into in­tel­lec­tual traps of our own mak­ing. When you be­lieve every cor­re­la­tion adds sup­port to your causal the­o­ry, you just get more and more wrong as you col­lect more data.

Heuristics & Biases

Pa­per’s dis­cus­sion sec­tion vs quotes in Now as­sum­ing the fore­go­ing to be right (which I’m not sure about; in par­tic­u­lar, I’m du­bi­ous that cor­re­la­tions in causal nets re­ally do in­crease much faster than causal re­la­tions do), what’s the psy­chol­ogy of this? I see a few ma­jor ways that peo­ple might be in­cor­rectly rea­son­ing when they over­es­ti­mate the ev­i­dence given by a cor­re­la­tion:

  • they might be aware of the im­bal­ance be­tween cor­re­la­tions and cau­sa­tion, but un­der­es­ti­mate how much more com­mon cor­re­la­tion be­comes com­pared to cau­sa­tion.

    This could be shown by giv­ing causal di­a­grams and see­ing how elicited prob­a­bil­ity changes with the size of the di­a­grams: if the prob­a­bil­ity is con­stant, then the sub­jects would seem to be con­sid­er­ing the re­la­tion­ship in iso­la­tion and ig­nor­ing the con­text.

    It might be re­me­di­a­ble by show­ing a net­work and jar­ring peo­ple out of a sim­plis­tic com­par­i­son ap­proach.

  • they might not be rea­son­ing in a causal-net frame­work at all, but start­ing from the naive 33% base-rate you get when you treat all 3 kinds of causal re­la­tion­ships equal­ly.

    This could be shown by elic­it­ing es­ti­mates and see­ing whether the es­ti­mates tend to look like base rates of 33% and mod­i­fi­ca­tions there­of.

    Sterner mea­sures might be need­ed: could we draw causal nets with not just ar­rows show­ing in­flu­ence but also an­other kind of ar­row show­ing cor­re­la­tions? For ex­am­ple, the ar­rows could be drawn in black, in­verse cor­re­la­tions drawn in red, and reg­u­lar cor­re­la­tions drawn in green. The pic­ture would be rather messy, but sim­ply by com­par­ing how few black ar­rows there are to how many green and red ones, it might vi­su­ally make the case that cor­re­la­tion is much more com­mon than cau­sa­tion.

  • al­ter­nate­ly, they may re­ally be rea­son­ing causally and suffer from a truly deep & per­sis­tent cog­ni­tive il­lu­sion that when peo­ple say ‘cor­re­la­tion’ it’s re­ally a kind of cau­sa­tion and don’t un­der­stand the tech­ni­cal mean­ing of ‘cor­re­la­tion’ in the first place (which is not as un­likely as it may sound, given ex­am­ples like demon­stra­tion of the per­sis­tence of Aris­totelian folk-physics in physics stu­dents as all they had learned was guess­ing pass­words; on the test used, see eg Hal­loun & Hestenes 1985 & Hestenes et al 1992); in which cause it’s not sur­pris­ing that if they think they’ve been told a re­la­tion­ship is ‘cau­sa­tion’, then they’ll think the re­la­tion­ship is cau­sa­tion. Ilya re­marks:

    has this hy­poth­e­sis that a lot of prob­a­bilis­tic fallacies/paradoxes/biases are due to the fact that causal and not prob­a­bilis­tic re­la­tion­ships are what our brain na­tively thinks about. So e.g. is sur­pris­ing be­cause we in­tu­itively think of a con­di­tional dis­tri­b­u­tion (where con­di­tion­ing can change any­thing!) as a kind of “in­ter­ven­tional dis­tri­b­u­tion” (no Simp­son’s type re­ver­sal un­der in­ter­ven­tions: “Un­der­stand­ing Simp­son’s Para­dox”, Pearl 2014 [see also Pearl’s com­ments on Nielsen’s blog)).

    This hy­poth­e­sis would claim that peo­ple who haven’t looked into the math just in­ter­pret state­ments about con­di­tional prob­a­bil­i­ties as about “in­ter­ven­tional prob­a­bil­i­ties” (or what­ever their in­tu­itive ana­logue of a causal thing is).

    This might be testable by try­ing to iden­tify sim­ple ex­am­ples where the two ap­proaches di­verge, sim­i­lar to Heste­nes’s quiz for di­ag­nos­ing be­lief in folk-physics.

Appendix

Everything correlates with everything

Sta­tis­ti­cal folk­lore as­serts that “every­thing is cor­re­lated”: in any re­al-world dataset, most or all mea­sured vari­ables will have non-zero cor­re­la­tions, even be­tween vari­ables which ap­pear to be com­pletely in­de­pen­dent of each oth­er, and that these cor­re­la­tions are not merely sam­pling er­ror flukes but will ap­pear in large-s­cale datasets to ar­bi­trar­ily des­ig­nated lev­els of sta­tis­ti­cal-sig­nifi­cance or pos­te­rior prob­a­bil­i­ty.

This raises se­ri­ous ques­tions for nul­l-hy­poth­e­sis sta­tis­ti­cal-sig­nifi­cance test­ing, as it im­plies the null hy­poth­e­sis of 0 will al­ways be re­jected with suffi­cient data, mean­ing that a fail­ure to re­ject only im­plies in­suffi­cient data, and pro­vides no ac­tual test or con­fir­ma­tion of a the­o­ry. Even a di­rec­tional pre­dic­tion is min­i­mal con­fir­ma­tion since there is a 50% chance of pick­ing the right di­rec­tion at ran­dom.

It also has im­pli­ca­tions for con­cep­tu­al­iza­tions of the­o­ries & causal mod­els, in­ter­pre­ta­tions of struc­tural mod­els, and other sta­tis­ti­cal prin­ci­ples such as the “spar­sity prin­ci­ple”.

Main ar­ti­cle: .


  1. Al­though this may have been sug­gested:

    I used to read a mag­a­zine called Milo that cov­ered a bunch of differ­ent strength sports. I ended my sub­scrip­tion after read­ing an ar­ti­cle in which an en­tirely se­ri­ous au­thor wrote about how he no­ticed that shortly after he started hear­ing birds singing in the morn­ing, plants started to grow. His con­clu­sion was that bird­song made plants grow. If I re­mem­ber cor­rect­ly, he then con­cluded that it was the vi­bra­tions in the bird­song that made the plants grow, there­fore vi­bra­tions were good for strength, there­fore you could make your mus­cles grow through be­ing ex­posed to cer­tain types of vi­bra­tions, i.e. bird­song. It was my fa­vorite ar­ti­cle of all time, just for the way the guy started out so ab­surdly wrong and just kept dig­ging.

    I used to read old weight train­ing books. In one of them the au­thor proudly re­called how his sec­re­tary had asked him for ad­vice on how to lose weight. This guy went around study­ing all the sec­re­taries and no­ticed that the thin ones sat more up­right com­pared to the fat ones. He then rec­om­mended to his sec­re­tary that she sit more up­right, and if she did this she would lose weight. What I loved most about that whole story was that the guy was so proud of his analy­sis and con­clu­sion that he made it an en­tire chap­ter of his book, and that no one in the en­tire pub­lish­ing chain from the writer to the ed­i­tor to the proof­reader to the li­brar­ian who put the book on the shelves no­ticed any prob­lems with any of it.

    ↩︎
  2. Slate pro­vides a nice ex­am­ple from Pear­son’s The Gram­mar of Sci­ence (pg407):

    All cau­sa­tion as we have de­fined it is cor­re­la­tion, but the con­verse is not nec­es­sar­ily true, i.e. where we find cor­re­la­tion we can­not al­ways pre­dict cau­sa­tion. In a mixed African pop­u­la­tion of Kaffirs and Eu­ro­peans, the for­mer may be more sub­ject to small­pox, yet it would be use­less to as­sert dark­ness of skin (and not ab­sence of vac­ci­na­tion) as a cause.

    ↩︎
  3. Like tem­po­ral or­der or bi­o­log­i­cal plau­si­bil­i­ty—­for ex­am­ple, in med­i­cine you can gen­er­ally rule out some of the re­la­tion­ships this way: if you find a cor­re­la­tion be­tween tak­ing su­pertetro­hy­dra­cy­line™ and then later one’s de­pres­sion (or flu symp­toms or…) get­ting bet­ter, what does this mean? We have 3 gen­eral pat­terns: A → B, A ← B, and A ← C → B. It seems un­likely that #2 (A ← B), ‘cur­ing de­pres­sion causes tak­ing su­pertetro­hy­dra­cy­line™ pre­vi­ously in time’, is true since that re­quires time trav­el; we can rule that one out. So, the causal re­la­tion­ship is prob­a­bly ei­ther #1 (A → B) di­rect cau­sa­tion (su­pertetro­hy­dra­cy­line™ cures de­pres­sion), or #3 (A ← C → B), a com­mon cause and con­found­ing, in which some third vari­able is re­spon­si­ble for both out­comes (like ‘doc­tors pre­scribe su­pertetro­hy­dra­cy­line™ to pa­tients who are get­ting bet­ter’ some process leads to differ­en­tial treat­ment like or doc­tors pre­scrib­ing su­pertetro­hy­dra­cy­line™ to pa­tients they think have the best prog­no­sis). We may not know which, but at least the tem­po­ral or­der did let us rule out one of the 3 pos­si­bil­i­ties, which is a start.↩︎

  4. I bor­row this phrase from the pa­per “Look­ing to the 21st cen­tu­ry: have we learned from our mis­takes, or are we doomed to com­pound them?”, Shapiro 2004:

    In 1968, when I at­tended a course in epi­demi­ol­ogy 101, Dick Mon­son was fond of point­ing out that when it comes to rel­a­tive risk es­ti­mates, epi­demi­ol­o­gists are not in­tel­lec­tu­ally su­pe­rior to apes. Like them, we can count only three num­bers: 1, 2 and BIG (I am in­debted to Allen Mitchell for Fig­ure 7). In ad­e­quately de­signed stud­ies we can be rea­son­ably con­fi­dent about BIG rel­a­tive risks, some­times; we can be only guard­edly con­fi­dent about rel­a­tive risk es­ti­mates of the or­der of 2.0, oc­ca­sion­al­ly; we can hardly ever be con­fi­dent about es­ti­mates of less than 2.0, and when es­ti­mates are much be­low 2.0, we are quite sim­ply out of busi­ness. Epi­demi­ol­o­gists have only prim­i­tive tools, which for small rel­a­tive risks are too crude to en­able us to dis­tin­guish be­tween bi­as, con­found­ing and cau­sa­tion.

    …To il­lus­trate that point, I have to al­lude to a prob­lem that is usu­ally avoided be­cause to men­tion it in pub­lic is con­sid­ered im­po­lite: I re­fer to bias (un­con­scious, to be sure, but bias all the same) on the part of the in­ves­ti­ga­tor. And in or­der not to ob­scure the is­sue by con­sid­er­ing stud­ies of ques­tion­able qual­i­ty, I have cho­sen the ex­am­ple of pu­ta­tively causal (or pre­ven­tive) as­so­ci­a­tions pub­lished by the Nurses Health Study (NHS). For that study, the in­ves­ti­ga­tors have re­peat­edly claimed that their meth­ods are al­most per­fect. Over the years, the NHS in­ves­ti­ga­tors have pub­lished a tor­rent of pa­pers and Fig­ure 8 gives an en­tirely fic­ti­tious but nonethe­less valid dis­tri­b­u­tion of the rel­a­tive risk es­ti­mates de­rived from them (for rel­a­tive risk es­ti­mates of less than uni­ty, as­sume the in­verse val­ues). The over­whelm­ing ma­jor­ity of the es­ti­mates have been less than 2 and mostly less than 1.5, and the great ma­jor­ity have been in­ter­preted as causal (or pre­ven­tive). Well, per­haps they are and per­haps they are not: we can­not tell. But, per­haps as a mat­ter of qua­si­-re­li­gious faith, the in­ves­ti­ga­tors have to be­lieve that the small risk in­cre­ments they have ob­served can be in­ter­preted and that they can be in­ter­preted as causal (or pre­ven­tive). Oth­er­wise they can hardly jus­tify their own ex­is­tence. They have no choice but to ig­nore Fe­in­stein’s dic­tum [Sev­eral years ago, Al­van Fe­in­stein made the point that if some sci­en­tific fal­lacy is demon­strated and if it can­not be re­butted, a con­ve­nient way around the prob­lem is sim­ply to pre­tend that it does not ex­ist and to ig­nore it.]

    ↩︎
  5. Apro­pos of End­ing Med­ical Re­ver­sal, es­ti­mat­ing a ~40% er­ror rate in med­ical in­ter­ven­tions.↩︎