2021-stephan.pdf: “Interpolating Causal Mechanisms: The Paradox of Knowing More”, (2021-03):
Causal knowledge is not static; it is constantly modified based on new evidence. The present set of seven experiments explores 1 important case of causal belief revision that has been neglected in research so far: causal interpolations.
A simple prototypic case of an interpolation is a situation in which we initially have knowledge about a causal relation or a positive covariation between 2 variables but later become interested in the mechanism linking these 2 variables. Our key finding is that the interpolation of mechanism variables tends to be misrepresented, which leads to the paradox of knowing more: The more people know about a mechanism, the weaker they tend to find the probabilistic relation between the 2 variables (i.e., weakening effect). Indeed, in all our experiments we found that, despite identical learning data about 2 variables, the probability linking the 2 variables was judged higher when follow-up research showed that the 2 variables were assumed to be directly causally linked (i.e., C → E) than when participants were instructed that the causal relation is in fact mediated by a variable representing a component of the mechanism (M; i.e., C → M → E).
Our explanation of the weakening effect is that people often confuse discoveries of preexisting but unknown mechanisms with situations in which new variables are being added to a previously simpler causal model, thus violating causal stability assumptions in natural kind domains. The experiments test several implications of this hypothesis.
[Keywords: belief revision, causal Bayes nets, causal reasoning, interpolation, probabilistic reasoning]
[Original OSF data; remember, “it all adds up to normality”.]
2020-kaufman.pdf: “Commentary: Cynical epidemiology”, Jay S. Kaufman
2019-gordon.pdf: “A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook”, (2019-05-04; ):
Measuring the causal effects of digital advertising remains challenging despite the availability of granular data. Unobservable factors make exposure endogenous, and advertising’s effect on outcomes tends to be small. In principle, these concerns could be addressed using randomized controlled trials (RCTs). In practice, few online ad campaigns rely on RCTs and instead use observational methods to estimate ad effects. We assess empirically whether the variation in data typically available in the advertising industry enables observational methods to recover the causal effects of online advertising. Using data from 15 U.S. advertising experiments at Facebook comprising 500 million user-experiment observations and 1.6 billion ad impressions, we contrast the experimental results to those obtained from multiple observational models. The observational methods often fail to produce the same effects as the randomized experiments, even after conditioning on extensive demographic and behavioral variables. In our setting, advances in causal inference methods do not allow us to isolate the exogenous variation needed to estimate the treatment effects. We also characterize the incremental explanatory power our data would require to enable observational methods to successfully measure advertising effects. Our findings suggest that commonly used observational approaches based on the data usually available in the industry often fail to accurately measure the true effect of advertising.
2019-song.pdf: “Effect of a Workplace Wellness Program on Employee Health and Economic Outcomes: A Randomized Clinical Trial”, American Medical Association ( )
2019-schellenberg.pdf: “Correlation = causation? Music training, psychology, and neuroscience”, E. Glenn Schellenberg
2017-allamee.pdf: “Percutaneous coronary intervention in stable angina (ORBITA): a double-blind, randomised controlled trial”, (2017-11-02; ):
Background: Symptomatic relief is the primary goal of percutaneous coronary intervention (PCI) in stable angina and is commonly observed clinically. However, there is no evidence from blinded, placebo-controlled randomised trials to show its efficacy.
Methods: ORBITA is a blinded, multicentre randomised trial of PCI versus a placebo procedure for angina relief that was done at five study sites in the UK. We enrolled patients with severe (≥70%) single-vessel stenoses. After enrolment, patients received 6 weeks of medication optimisation. Patients then had pre-randomisation assessments with cardiopulmonary exercise testing, symptom questionnaires, and dobutamine stress echocardiography. Patients were randomised 1:1 to undergo PCI or a placebo procedure by use of an automated online randomisation tool. After 6 weeks of follow-up, the assessments done before randomisation were repeated at the final assessment. The primary endpoint was difference in exercise time increment between groups. All analyses were based on the intention-to-treat principle and the study population contained all participants who underwent randomisation. This study is registered with ClinicalTrials.gov, number NCT02062593.
Findings: ORBITA enrolled 230 patients with ischaemic symptoms. After the medication optimisation phase and between Jan 6, 2014, and Aug 11, 2017, 200 patients underwent randomisation, with 105 patients assigned PCI and 95 assigned the placebo procedure. Lesions had mean area stenosis of 84.4% (SD 10.2), fractional flow reserve of 0.69 (0.16), and instantaneous wave-free ratio of 0.76 (0.22). There was no statistically-significant difference in the primary endpoint of exercise time increment between groups (PCI minus placebo 16.6 s, 95% CI −8.9 to 42.0, p = 0.200). There were no deaths. Serious adverse events included four pressure-wire related complications in the placebo group, which required PCI, and five major bleeding events, including two in the PCI group and three in the placebo group.
Interpretation: In patients with medically treated angina and severe coronary stenosis, PCI did not increase exercise time by more than the effect of a placebo procedure. The efficacy of invasive procedures can be assessed with a placebo control, as is standard for pharmacotherapy.
2013-prasad.pdf: “Observational studies often make clinical practice recommendations: an empirical evaluation of authors' attitudes”, Vinay Prasad, Joel Jorgenson, John P. A. Ioannidis, Adam Cifu ( )
2010-foster.pdf: “Causal Inference and Developmental Psychology”, E. Michael Foster
2001-ioannidis.pdf: “Comparison of Evidence of Treatment Effects in Randomized and Nonrandomized Studies”, (2001-08-01; ):
Context: There is substantial debate about whether the results of nonrandomized studies are consistent with the results of randomized controlled trials on the same topic.
Objectives: To compare results of randomized and nonrandomized studies that evaluated medical interventions and to examine characteristics that may explain discrepancies between randomized and nonrandomized studies.
Data Sources: MEDLINE (1966–March 2000), the Cochrane Library (Issue 3, 2000), and major journals were searched.
Study Selection: Forty-five diverse topics were identified for which both randomized trials (n = 240) and nonrandomized studies (n = 168) had been performed and had been considered in meta-analyses of binary outcomes.
Data Extraction: Data on events per patient in each study arm and design and characteristics of each study considered in each meta-analysis were extracted and synthesized separately for randomized and nonrandomized studies.
Data Synthesis: Very good correlation was observed between the summary odds ratios of randomized and nonrandomized studies (r = 0.75; p < 0.001); however, nonrandomized studies tended to show larger treatment effects (28 vs 11; p = 0.009). Between-study heterogeneity was frequent among randomized trials alone (23%) and very frequent among nonrandomized studies alone (41%). The summary results of the 2 types of designs differed beyond chance in 7 cases (16%). Discrepancies beyond chance were less common when only prospective studies were considered (8%). Occasional differences in sample size and timing of publication were also noted between discrepant randomized and nonrandomized studies. In 28 cases (62%), the natural logarithm of the odds ratio differed by at least 50%, and in 15 cases (33%), the odds ratio varied at least 2-fold between nonrandomized studies and randomized trials.
Conclusions: Despite good correlation between randomized trials and nonrandomized studies—in particular, prospective studies—discrepancies beyond chance do occur and differences in estimated magnitude of treatment effect are very common.
2000-maclehose.pdf: “Study design and estimates of effectiveness”, MacLehose et al. ( )
1999-dehejia.pdf: “Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs”, (1999-10-01; ):
This article uses propensity score methods to estimate the treatment impact of the National Supported Work (NSW) Demonstration, a labor training program, on post-intervention earnings. We use data from Lalonde’s evaluation of nonexperimental methods that combine the treated units from a randomized evaluation of the NSW with nonexperimental comparison units drawn from survey datasets. We apply propensity score methods to this composite dataset and demonstrate that, relative to the estimators that Lalonde evaluates, propensity score estimates of the treatment impact are much closer to the experimental benchmark estimate. Propensity score methods assume that the variables associated with assignment to treatment are observed (referred to as ignorable treatment assignment, or selection on observables). Even under this assumption, it is difficult to control for differences between the treatment and comparison groups when they are dissimilar and when there are many pre-intervention variables. The estimated propensity score (the probability of assignment to treatment, conditional on pre-intervention variables) summarizes the pre-intervention variables. This offers a diagnostic on the comparability of the treatment and comparison groups, because one has only to compare the estimated propensity score across the two groups. We discuss several methods (such as stratification and matching) that use the propensity score to estimate the treatment impact. When the range of estimated propensity scores of the treatment and comparison groups overlap, these methods can estimate the treatment impact for the treatment group. A sensitivity analysis shows that our estimates are not sensitive to the specification of the estimated propensity score, but are sensitive to the assumption of selection on observables. We conclude that when the treatment and comparison groups overlap, and when the variables determining assignment to treatment are observed, these methods provide a means to estimate the treatment impact. Even though propensity score methods are not always applicable, they offer a diagnostic on the quality of nonexperimental comparison groups in terms of observable pre-intervention variables.
1999-giraud.pdf: “Superadditive Correlation”, B. G. Giraud, John M. Heumann, Alan S. Lapedes
1998-britton.pdf: “Choosing Between Randomised and Non-randomised Studies”, Britton et al. ( )
1995-friedlander.pdf: “Evaluating Program Evaluations: New Evidence on Commonly Used Nonexperimental Methods”, Daniel Friedlander, Philip K. Robins ( )
1992-smith.pdf: “Smoking as 'independent' risk factor for suicide: illustration of an artifact from observational epidemiology?”, George Davey Smith, Andrew N. Phillips, James D. Neaton ( )
1992-phillips.pdf: “Bias in relative odds estimation owing to imprecise measurement of correlated exposures”, Andrew N. Phillips, George Davey Smith ( )
1991-phillips.pdf: “How independent are 'independent' effects? Relative risk estimation when correlated exposures are measured imprecisely”, Andrew N. Phillips, George Davey Smith ( )
1990-horwitz.pdf: “Developing improved observational methods for evaluating therapeutic effectiveness”, (1990-11; ):
Therapeutic efficacy is often studied with observational surveys of patients whose treatments were selected non-experimentally. The results of these surveys are distrusted because of the fear that biased results occur in the absence of experimental principles, particularly randomization. The purpose of the current study was to develop and validate improved observational study designs by incorporating many of the design principles and patient assembly procedures of the randomized trial. The specific topic investigated was the prophylactic effectiveness of β-blocker therapy after an acute myocardial infarction.
To accomplish the research objective, three sets of data were compared. First, we developed a restricted cohort based on the eligibility criteria of the randomized clinical trial; second, we assembled an expanded cohort using the same design principles except for not restricting patient eligibility; and third, we used the data from the Beta Blocker Heart Attack Trial (BHAT), whose results served as the gold standard for comparison.
In this research, the treatment difference in death rates for the restricted cohort and the BHAT trial was nearly identical. In contrast, the expanded cohort had a larger treatment difference than was observed in the BHAT trial. We also noted the important and largely neglected role that eligibility criteria may play in ensuring the validity of treatment comparisons and study outcomes. The new methodological strategies we developed may improve the quality of observational studies and may be useful in assessing the efficacy of the many medical/
surgical therapies that cannot be tested with randomized clinical trials.
1990-hill.pdf: “Memories of the British streptomycin trial in tuberculosis: The First Randomized Trial”, Austin Bradford Hill
1987-fraker.pdf: “The Adequacy of Comparison Group Designs for Evaluations of Employment-Related Programs”, (1987; ):
This study investigates empirically the strengths and limitations of using experimental versus nonexperimental designs for evaluating employment and training programs. The assessment involves comparing results from an experimental-design study-the National Supported Work Demonstration-with the estimated impacts of Supported Work based on analyses using comparison groups constructed from the Current Population Surveys. The results indicate that nonexperimental designs cannot be relied on to estimate the effectiveness of employment programs. Impact estimates tend to be sensitive both to the comparison group construction methodology and to the analytic model used. There is currently no way a priori to ensure that the results of comparison group studies will be valid indicators of the program impacts.
[Keywords: public assistance programs, analytical models, analytical estimating, employment, control groups, estimation methods, random sampling, human resources, public works legislation, statistical-significance]
1984-yusuf.pdf: “Why do we need some large, simple randomized trials?”, Salim Yusuf, Rory Collins, Richard Peto
1966-box.pdf: “Use and Abuse of Regression”, George E. P. Box
1948-skinner.pdf: “'Superstition' in the Pigeon”, (1948-04):
“A pigeon is brought to a stable state of hunger by reducing it to 75% of its weight when well fed. It is put into an experimental cage for a few minutes each day. A food hopper attached to the cage may be swung into place so that the pigeon can eat from it. A solenoid and a timing relay hold the hopper in place for 5 sec. at each reinforcement. If a clock is now arranged to present the food hopper at regular intervals with no reference whatsoever to the bird’s behavior, operant conditioning usually takes place.” The bird tends to learn whatever response it is making when the hopper appears. The response may be extinguished and reconditioned. “The experiment might be said to demonstrate a sort of superstition. The bird behaves as if there were a causal relation between its behavior and the presentation of food, although such a relation is lacking.”
1926-yule.pdf: “Why do we Sometimes get Nonsense-Correlations between Time-Series?—A Study in Sampling and the Nature of Time-Series”, G. Udny Yule ( )