1955-michie.pdf: “The Importance of Being Cross-Bred”, Donald Michie, Anne McLaren
Much medical research is observational. The reporting of observational studies is often of insufficient quality. Poor reporting hampers the assessment of the strengths and weaknesses of a study and the generalizability of its results. Taking into account empirical evidence and theoretical considerations, a group of methodologists, researchers, and editors developed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) recommendations to improve the quality of reporting of observational studies. The STROBE Statement consists of a checklist of 22 items, which relate to the title, abstract, introduction, methods, results and discussion sections of articles. Eighteen items are common to cohort studies, case-control studies and cross-sectional studies and four are specific to each of the three study designs. The STROBE Statement provides guidance to authors about how to improve the reporting of observational studies and facilitates critical appraisal and interpretation of studies by reviewers, journal editors and readers. This explanatory and elaboration document is intended to enhance the use, understanding, and dissemination of the STROBE Statement. The meaning and rationale for each checklist item are presented. For each item, one or several published examples and, where possible, references to relevant empirical studies and methodological literature are provided. Examples of useful flow diagrams are also included. The STROBE Statement, this document, and the associated Web site should be helpful resources to improve reporting of observational research.
Alessandro Liberati and colleagues present an Explanation and Elaboration of the PRISMA Statement, updated guidelines for the reporting of systematic reviews and meta-analyses.
Systematic reviews and meta-analyses are essential to summarize evidence relating to efficacy and safety of health care interventions accurately and reliably. The clarity and transparency of these reports, however, is not optimal. Poor reporting of systematic reviews diminishes their value to clinicians, policy makers, and other users.
Since the development of the QUOROM (QUality Of Reporting Of Meta-analysis) Statement—a reporting guideline published in 1999—there have been several conceptual, methodological, and practical advances regarding the conduct and reporting of systematic reviews and meta-analyses. Also, reviews of published systematic reviews have found that key information about these studies is often poorly reported. Realizing these issues, an international group that included experienced authors and methodologists developed PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) as an evolution of the original QUOROM guideline for systematic reviews and meta-analyses of evaluations of health care interventions.
The PRISMA Statement consists of a 27-item checklist and a four-phase flow diagram. The checklist includes items deemed essential for transparent reporting of a systematic review. In this Explanation and Elaboration document, we explain the meaning and rationale for each checklist item. For each item, we include an example of good reporting and, where possible, references to relevant empirical studies and methodological literature. The PRISMA Statement, this document, and the associated Web site (http: / / www.prisma-statement.org / ) should be helpful resources to improve reporting of systematic reviews and meta-analyses.
1982-bouchard.pdf: “Identical Twins Reared Apart: Reanalysis or Pseudo-analysis?”, Thomas J. Bouchard Jr.
1975-jackson.pdf: “Intelligence and Ideology [book review of _The Science and Politics of I.Q._, Leon J. Kamin]”, D. N. Jackson
1938-fisher.pdf: “Presidential address to the first Indian statistical congress”, R. A. Fisher
1939-pearson.pdf: “"Student" as Statistician”, (1939-01-00; ):
[Egon Pearson describes Student, or Gosset, as a statistician: Student corresponded widely with young statisticians/mathematicians, encouraging them, and having an outsized influence not reflected in his publication. Student’s preferred statistical tools were remarkably simple, focused on correlations and standard deviations, but wielded effectively in the analysis and efficient design of experiments (particularly agricultural experiments), and he was an early decision-theorist, focused on practical problems connected to his Guinness Brewery job—which detachment from academia partially explains why he didn’t publish methods or results immediately or often. The need to handle small n of the brewery led to his work on small-sample approximations rather than, like Pearson et al in the Galton biometric tradition, relying on collecting large datasets and using asymptotic methods, and Student carried out one of the first Monte Carlo simulations.]
1976-savage.pdf#page=46: “On Rereading R. A. Fisher [Fisher Memorial lecture, with comments]”, Leonard J. Savage, John Pratt, Bradley Efron, Churchill Eisenhart, Bruno de Finetti, D. A. S. Fraser, V. P. Godambe, I. J. Good, O. Kempthorne, Stephen M. Stigler, I. Richard Savage
1814-laplace-philosophicalessayonprobabilities-ch5probabilitiestestimonies.pdf: “Philosophical Essay on Probabilities, Chapter 11: Concerning the Probabilities of Testimonies”, (1814; ):
The majority of our opinions being founded on the probability of proofs it is indeed important to submit it to calculus. Things it is true often become impossible by the difficulty of appreciating the veracity of witnesses and by the great number of circumstances which accompany the deeds they attest; but one is able in several cases to resolve the problems which have much analogy with the questions which are proposed and whose solutions may be regarded as suitable approximations to guide and to defend us against the errors and the dangers of false reasoning to which we are exposed. An approximation of this kind, when it is well made, is always preferable to the most specious reasonings.
We would give no credence to the testimony of a man who should attest to us that in throwing a hundred dice into the air they had all fallen on the same face. If we had ourselves been spectators of this event we should believe our own eyes only after having carefully examined all the circumstances, and after having brought in the testimonies of other eyes in order to be quite sure that there had been neither hallucination nor deception. But after this examination we should not hesitate to admit it in spite of its extreme improbability; and no one would be tempted, in order to explain it, to recur to a denial of the laws of vision. We ought to conclude from it that the probability of the constancy of the laws of nature is for us greater than this, that the event in question has not taken place at all a probability greater than that of the majority of historical facts which we regard as incontestable. One may judge by this the immense weight of testimonies necessary to admit a suspension of natural laws, and how improper it would be to apply to this case the ordinary rules of criticism. All those who without offering this immensity of testimonies support this when making recitals of events contrary to those laws, decrease rather than augment the belief which they wish to inspire; for then those recitals render very probable the error or the falsehood of their authors. But that which diminishes the belief of educated men increases often that of the uneducated, always greedy for the wonderful.
The action of time enfeebles then, without ceasing, the probability of historical facts just as it changes the most durable monuments. One can indeed diminish it by multiplying and conserving the testimonies and the monuments which support them. Printing offers for this purpose a great means, unfortunately unknown to the ancients. In spite of the infinite advantages which it procures the physical and moral revolutions by which the surface of this globe will always be agitated will end, in conjunction with the inevitable effect of time, by rendering doubtful after thousands of years the historical facts regarded to-day as the most certain.
Several hundred research groups attempted replications of published effects in so-called Many Labs studies involving thousands of research participants. Given this enormous investment, it seems timely to assess what has been learned and what can be learned from this type of project. My evaluation addresses 4 questions: First, do these replication studies inform us about the replicability of social psychological research? Second, can replications detect fraud? Third, does the failure to replicate a finding indicate that the original result was wrong? Finally, do these replications help to support or disprove any social psychological theories? Although evidence of replication failures resulted in important methodological changes, the 2015 Open Science Collaboration findings sufficed to make the point. To assess the state of social psychology, we have to evaluate theories rather than randomly selected research findings.
… In only 2, and 2 rather unusual, cases was fraud discovered by replication failure. Even the Stapel fraud was revealed by his research students, who had become suspicious of his unusual success in empirically supporting the most daring hypotheses (Stroebe et al 2012). With the new rule that data for published research have to be made available, it can be expected that fraud cases will increasingly be detected because of suspicious data patterns.
Another reason for replications being poor fraud detectors is that clever fraudsters, who stick closely to predictions that are plausible in the light of existing literature, have a very good chance that their research will be successfully replicated by their colleagues. (If Stapel had kept to this recipe and not become overconfident in his later research, his fraud might never have been detected.) For example, DeCoster & Claypool 2004 published a meta-analysis of priming effects on impression formation supporting a general model of information bias. The literature was very coherent and supportive of their model. The only unexpected finding was that effect sizes of studies conducted in Europe were substantially greater than those of American studies. They attributed this to cultural differences. However, when I checked the authorship of the European studies, it turned out that the majority had been conducted by Stapel, and many of these studies later turned out to be fraudulent (Levelt et al 2012). Thus, in inventing data, Stapel managed to get the priming effects right but overestimated the size of these effects.
2017-allamee.pdf: “Percutaneous coronary intervention in stable angina (ORBITA): a double-blind, randomised controlled trial”, (2017-11-02; ):
Background: Symptomatic relief is the primary goal of percutaneous coronary intervention (PCI) in stable angina and is commonly observed clinically. However, there is no evidence from blinded, placebo-controlled randomised trials to show its efficacy.
Methods: ORBITA is a blinded, multicentre randomised trial of PCI versus a placebo procedure for angina relief that was done at five study sites in the UK. We enrolled patients with severe (≥70%) single-vessel stenoses. After enrolment, patients received 6 weeks of medication optimisation. Patients then had pre-randomisation assessments with cardiopulmonary exercise testing, symptom questionnaires, and dobutamine stress echocardiography. Patients were randomised 1:1 to undergo PCI or a placebo procedure by use of an automated online randomisation tool. After 6 weeks of follow-up, the assessments done before randomisation were repeated at the final assessment. The primary endpoint was difference in exercise time increment between groups. All analyses were based on the intention-to-treat principle and the study population contained all participants who underwent randomisation. This study is registered with ClinicalTrials.gov, number NCT02062593.
Findings: ORBITA enrolled 230 patients with ischaemic symptoms. After the medication optimisation phase and between Jan 6, 2014, and Aug 11, 2017, 200 patients underwent randomisation, with 105 patients assigned PCI and 95 assigned the placebo procedure. Lesions had mean area stenosis of 84.4% (SD 10.2), fractional flow reserve of 0.69 (0.16), and instantaneous wave-free ratio of 0.76 (0.22). There was no statistically-significant difference in the primary endpoint of exercise time increment between groups (PCI minus placebo 16.6 s, 95% CI −8.9 to 42.0, p = 0.200). There were no deaths. Serious adverse events included four pressure-wire related complications in the placebo group, which required PCI, and five major bleeding events, including two in the PCI group and three in the placebo group.
Interpretation: In patients with medically treated angina and severe coronary stenosis, PCI did not increase exercise time by more than the effect of a placebo procedure. The efficacy of invasive procedures can be assessed with a placebo control, as is standard for pharmacotherapy.
First, the title, which makes an excellent point. It can be valuable to think about measurement, comparison, and variation, even if commonly-used statistical methods can mislead.
This reminds me of the idea in decision analysis that the most important thing is not the solution of the decision tree but rather what you decide to put in the tree in the first place, or even, stepping back, what are your goals. The idea is that the threat of decision analysis is more powerful than its execution (as Chrissy Hesse might say): the decision-analytic thinking pushes you to think about costs and uncertainties and alternatives and opportunity costs, and that’s all valuable even if you never get around to performing the formal analysis. Similarly, I take Tong’s point that statistical thinking motivates you to consider design, data quality, bias, variance, conditioning, causal inference, and other concerns that will be relevant, whether or not they all go into a formal analysis.
That said, I have one concern, which is that “the threat is more powerful than the execution” only works if the threat is plausible. If you rule out the possibility of the execution, then the threat is empty. Similarly, while I understand the appeal of “Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science”, I think this might be good static advice, applicable right now, but not good dynamic advice: if we do away with statistical inference entirely (except in the very rare cases when no external assumptions are required to perform statistical modeling), then there may be less of a sense of the need for statistical thinking.
Overall, though, I agree with Tong’s message, and I think everybody should read his article.
[Fierce but witty critique by David Stove of philosophy throughout the ages and defense of Logical Positivism, with Christian theology, Neoplatonism, and German Idealism as examples. Logical Positivists took the easy way out: the problem with these philosophies is not that they are gibberish or meaningless, because at least then they would all be wrong in the same way and could perhaps be refuted in the same way, but that they each are wrong in a myriad of different ways, ways for which we have no existing “fallacy” defined, entire universes of new errors—undermining the hope of using reason or philosophy to make any kind of progress. What is wrong with philosophy, and ourselves, if we cannot even explain why these are so badly wrong after millennia of thought and debate?]
2020-arora.pdf: “The Changing Structure of American Innovation: Some Cautionary Remarks for Economic Growth”, (2020; ):
A defining feature of modern economic growth is the systematic application of science to advance technology. However, despite sustained progress in scientific knowledge, recent productivity growth in the United States has been disappointing. We review major changes in the American innovation ecosystem over the past century. The past three decades have been marked by a growing division of labor between universities focusing on research and large corporations focusing on development. Knowledge produced by universities is not often in a form that can be readily digested and turned into new goods and services. Small firms and university technology transfer offices cannot fully substitute for corporate research, which had previously integrated multiple disciplines at the scale required to solve substantial technical problems. Therefore, whereas the division of innovative labor may have raised the volume of science by universities, it has also slowed, at least for a period of time, the transformation of that knowledge into novel products and processes.
“The causal foundations of applied probability and statistics”, (2020-11-05):
Statistical science (as opposed to mathematical statistics) involves far more than probability theory, for it requires realistic causal models of data generators—even for purely descriptive goals. Statistical decision theory requires more causality: Rational decisions are actions taken to minimize costs while maximizing benefits, and thus require explication of causes of loss and gain. Competent statistical practice thus integrates logic, context, and probability into scientific inference and decision using narratives filled with causality. This reality was seen and accounted for intuitively by the founders of modern statistics, but was not well recognized in the ensuing statistical theory (which focused instead on the causally inert properties of probability measures). Nonetheless, both statistical foundations and basic statistics can and should be taught using formal causal models. The causal view of statistical science fits within a broader information-processing framework which illuminates and unifies frequentist, Bayesian, and related probability-based foundations of statistics. Causality theory can thus be seen as a key component connecting computation to contextual information, not extra-statistical but instead essential for sound statistical training and applications.