2020-wojtowicz.pdf: “From Probability to Consilience: How Explanatory Values Implement Bayesian Reasoning”, (2020-10-23):
- Recent experiments show that we value explanations for many reasons, such as predictive power and simplicity.
- Bayesian rational analysis provides a functional account of these values, along with concrete definitions that allow us to measure and compare them across a variety of contexts, including visual perception, politics, and science.
- These values include descriptiveness, co-explanation, and measures of simplicity such as parsimony and concision. The first two are associated with the evaluation of explanations in the light of experience, while the latter concern the intrinsic features of an explanation.
- Failures to explain well can be understood as imbalances in these values: a conspiracy theorist, for example, may over-rate co-explanation relative to simplicity, and many similar ‘failures to explain’ that we see in social life may be analyzable at this level.
Recent work in cognitive science has uncovered a diversity of explanatory values, or dimensions along which we judge explanations as better or worse. We propose a Bayesian account of these values that clarifies their function and shows how they fit together to guide explanation-making. The resulting taxonomy shows that core values from psychology, statistics, and the philosophy of science emerge from a common mathematical framework and provide insight into why people adopt the explanations they do. This framework not only operationalizes the explanatory virtues associated with, for example, scientific argument-making, but also enables us to reinterpret the explanatory vices that drive phenomena such as conspiracy theories, delusions, and extremist ideologies.
[Keywords: explanation, explanatory values, Bayesian cognition, rational analysis, simplicity, vice epistemology]
2020-miller.pdf: “Laplace’s Theories of Cognitive Illusions, Heuristics and Biases”, (2020-06-03):
In his book from the early 1800s, Essai Philosophique sur les Probabilités, the mathematician Pierre-Simon de Laplace anticipated many ideas developed within the past 50 years in cognitive psychology and behavioral economics, explaining human tendencies to deviate from norms of rationality in the presence of probability and uncertainty. A look at Laplace’s theories and reasoning is striking, both in how modern they seem, how much progress he made without the benefit of systematic experimentation, and the novelty of a few of his unexplored conjectures. We argue that this work points to these theories being more fundamental and less contingent on recent experimental findings than we might have thought.
2020-oaksford.pdf: “New Paradigms in the Psychology of Reasoning”, (2019-09-12):
The psychology of verbal reasoning initially compared performance with classical logic. In the last 25 years, a new paradigm has arisen, which focuses on knowledge-rich reasoning for communication and persuasion and is typically modeled using Bayesian probability theory rather than logic. This paradigm provides a new perspective on argumentation, explaining the rational persuasiveness of arguments that are logical fallacies. It also helps explain how and why people stray from logic when given deductive reasoning tasks. What appear to be erroneous responses, when compared against logic, often turn out to be rationally justified when seen in the richer rational framework of the new paradigm. Moreover, the same approach extends naturally to inductive reasoning tasks, in which people extrapolate beyond the data they are given and logic does not readily apply. We outline links between social and individual reasoning and set recent developments in the psychology of reasoning in the wider context of Bayesian cognitive science.
2019-wright.pdf: “Allocation to groups: Examples of Lord's paradox”, (2019-07-12; ):
Background: Educational and developmental psychologists often examine how groups change over time. 2 analytic procedures—analysis of covariance (ANCOVA) and the gain score model—each seem well suited for the simplest situation, with just 2 groups and 2 time points. They can produce different results, what is known as Lord’s paradox.
Aims: Several factors should influence a researcher’s analytic choice. This includes whether the score from the initial time influences how people are assigned to groups. Examples are shown, which will help to explain this to researchers and students, and are of educational relevance. It is shown that a common method used to measure school effectiveness is biased against schools that serve students from groups that are historically poor performing.
Methods and results: The examples come from sports and measuring educational effectiveness (e.g., for teachers or schools). A simulation study shows that if the covariate influences group allocation, the ANCOVA is preferred, but otherwise, the gain score model may be appropriate. Regression towards the mean is used to account for these findings.
Conclusions: Analysts should consider the relationship between the covariate and group allocation when deciding upon their analytic method. Because the influence of the covariate on group allocation may be complex, the appropriate method may be complex. Because the influence of the covariate on group allocation may be unknown, the choice of method may require several assumptions.
[Keywords: Lord’s paradox, value-added models, ANCOVA, educator equity]
2019-lynch.pdf: “Bayesian Statistics in Sociology: Past, Present, and Future”, (2019-01-01):
Although Bayes’ theorem has been around for more than 250 years, widespread application of the Bayesian approach only began in statistics in 1990. By 2000, Bayesian statistics had made considerable headway into social science, but even now its direct use is rare in articles in top sociology journals, perhaps because of a lack of knowledge about the topic. In this review, we provide an overview of the key ideas and terminology of Bayesian statistics, and we discuss articles in the top journals that have used or developed Bayesian methods over the last decade. In this process, we elucidate some of the advantages of the Bayesian approach. We highlight that many sociologists are, in fact, using Bayesian methods, even if they do not realize it, because techniques deployed by popular software packages often involve Bayesian logic and/
or computation. Finally, we conclude by briefly discussing the future of Bayesian statistics in sociology.
2019-beaumont.pdf: “Approximate Bayesian Computation”, (2019-01-01):
Many of the statistical models that could provide an accurate, interesting, and testable explanation for the structure of a data set turn out to have intractable likelihood functions. The method of approximate Bayesian computation (ABC) has become a popular approach for tackling such models. This review gives an overview of the method and the main issues and challenges that are the subject of current research.
2014-tenan.pdf: “Bayesian model selection: The steepest mountain to climb”, Simone Tenan, Robert B. O’Hara, Iris Hendriks, Giacomo Tavecchia ( )
2012-kruschke.pdf: “Bayesian estimation supersedes the t test”, (2013-01-01; ):
Bayesian estimation for 2 groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their difference, and the normality of the data. The method handles outliers. The decision rule can accept the null value (unlike traditional t-tests) when certainty in the estimate is high (unlike Bayesian model comparison using Bayes factors). The method also yields precise estimates of statistical power for various research goals. The software and programs are free and run on Macintosh, Windows, and Linux platforms.
[Keywords: Bayesian statistics, effect size, robust estimation, Bayes factor, confidence interval]
2010-kruschke.pdf: “Bayesian data analysis”, John K. Kruschke ( )
2010-stigler.pdf: “rssa_a0157 469..482” ( )
2008-kesselman.pdf: “Verbal Probability Expressions In National Intelligence Estimates: A Comprehensive Analysis Of Trends From The Fifties Through Post-9/11”, (2008-05; ):
This research presents the findings of a study that analyzed words of estimators probability in the key judgments of National Intelligence Estimates from the 1950s through the 2000s. The research found that of the 50 words examined, only 13 were statistically-significant. Furthermore, interesting trends have emerged when the words are broken down into English modals, terminology that conveys analytical assessments and words employed by the National Intelligence Council as of 2006. One of the more intriguing findings is that use of the word will has by far been the most popular for analysts, registering over 700 occurrences throughout the decades; however, a word of such certainty is problematic in the sense that intelligence should never deal with 100% certitude. The relatively low occurrence and wide variety of word usage across the decades demonstrates a real lack of consistency in the way analysts have been conveying assessments over the past 58 years. Finally, the researcher suggests the Kesselman List of Estimative Words for use in the IC. The word list takes into account the literature review findings as well as the results of this study in equating odds with verbal probabilities.
[Rachel’s lit review, for example, makes for very interesting reading. She has done a thorough search of not only the intelligence but also the business, linguistics and other literatures in order to find out how other disciplines have dealt with the problem of “What do we mean when we say something is ‘likely’…” She uncovered, for example, that, in medicine, words of estimative probability such as “likely”, “remote” and “probably” have taken on more or less fixed meanings due primarily to outside intervention or, as she put it, “legal ramifications”. Her comparative analysis of the results and approaches taken by these other disciplines is required reading for anyone in the Intelligence Community trying to understand how verbal expressions of probability are actually interpreted. The NICs list only became final in the last several years so it is arguable whether this list of nine words really captures the breadth of estimative word usage across the decades. Rather, it would be arguable if this chart didn’t make it crystal clear that the Intelligence Community has really relied on just two words, “probably” and “likely” to express its estimates of probabilities for the last 60 years. All other words are used rarely or not at all.
Based on her research of what works and what doesn’t and which words seem to have the most consistent meanings to users, Rachel even offers her own list of estimative words along with their associated probabilities:
- Almost certain: 86–99%
- Highly likely: 71–85%
- Likely: 56–70%
- Chances a little better [or less] than even: 46–55%
- Unlikely: 31–45%
- Highly unlikely: 16–30%
- Remote: 1–15%
[See also “Decision by sampling”, Stewart et al 2006; “Processing Linguistic Probabilities: General Principles and Empirical Evidence”, Budescu & Wallsten 1995.]
Interpreting group differences observed in aggregated data is a practice that must be done with enormous care. Often the truth underlying such data is quite different than a naïve first look would indicate. The confusions that can arise are so perplexing that some of the more frequently occurring ones have been dubbed paradoxes. In this chapter we describe three of the best known of these paradoxes—Simpson’s Paradox, Kelley’s Paradox, and Lord’s Paradox—and illustrate them in a single data set. The data set contains the score distributions, separated by race, on the biological sciences component of the Medical College Admission Test (MCAT) and Step 1 of the United States Medical Licensing Examination™ (USMLE). Our goal in examining these data was to move toward a greater understanding of race differences in admissions policies in medical schools. As we demonstrate, the path toward this goal is hindered by differences in the score distributions which gives rise to these three paradoxes. The ease with which we were able to illustrate all of these paradoxes within a single data set is indicative of how wide spread they are likely to be in practice.
2004-knill.pdf: “The Bayesian brain: the role of uncertainty in neural coding and computation”, (2004-12; ):
To use sensory information efficiently to make judgments and guide action in the world, the brain must represent and use information about uncertainty in its computations for perception and action. Bayesian methods have proven successful in building computational theories for perception and sensorimotor control, and psychophysics is providing a growing body of evidence that human perceptual computations are ‘Bayes’ optimal’. This leads to the ‘Bayesian coding hypothesis’: that the brain represents sensory information probabilistically, in the form of probability distributions. Several computational schemes have recently been proposed for how this might be achieved in populations of neurons. Neurophysiological data on the hypothesis, however, is almost non-existent. A major challenge for neuroscientists is to test these ideas experimentally, and so determine whether and how neurons code information about sensory uncertainty.
2003-korb.pdf: “Bayesian Informal Logic and Fallacy”, (2004-01-01; ):
Bayesian reasoning has been applied formally to statistical inference, machine learning and analysing scientific method. Here I apply it informally to more common forms of inference, namely natural language arguments. I analyse a variety of traditional fallacies, deductive, inductive and causal, and find more merit in them than is generally acknowledged. Bayesian principles provide a framework for understanding ordinary arguments which is well worth developing.
Interpreting group differences observed in aggregated data is a practice that must be done with enormous care. Often the truth underlying such data is quite different than a naïve first look would indicate. The confusions that can arise are so perplexing that some of the more frequently occurring ones have been dubbed paradoxes. This article describes two of these paradoxes—Simpson’s paradox and Lord’s paradox—and illustrates them in a single dataset. The dataset contains the score distributions, separated by race, on the biological sciences component of the Medical College Admission Test (MCAT) and Step 1 of the United States Medical Licensing Examination™ (USMLE). Our goal in examining these data was to move toward a greater understanding of race differences in admissions policies in medical schools. As we demonstrate, the path toward this goal is hindered by differences in the score distributions which gives rise to these two paradoxes. The ease with which we were able to illustrate both of these paradoxes within a single dataset is indicative of how widespread they are likely to be in practice.
[Keywords: group differences, Lord’s paradox, Medical College Admission Test, Rubin’s model for causal inference, Simpson’s paradox, standardization, United States Medical Licensing Examination]
2000-wainer.pdf: “Kelley's Paradox”, Howard Wainer ( )
1996-kadane.pdf: “Statistical Issues in the Analysis of Data Gathered in the New Designs”, Joseph B. Kadane, Teddy Seidenfeld ( )
1995-cavin.pdf: “Is There Sufficient Historical Evidence to Establish the Resurrection of Jesus?”, Robert Greg Cavin ( )
1994-miller.pdf: “The Relevance of Group Membership for Personnel Selection: A Demonstration Using Bayes' Theorem”, (1994-09-01; ):
A Bayesian approach to problems of personnel selection implies a fundamental conflict between non-discrimination and merit selection. Groups–such as ethnic groups, sexes and races–do differ in various attributes relevant to vocational success, including intelligence and personality.
This journal has repeatedly discussed the technical and ethical issues raised by the existence of groups (races, sexes, ethnic groups) that frequently differ in abilities and other job-related characteristics (Eysenck 1991, Jensen, 1992; Levin, 1990, 1991). This paper is meant to add to that discussion by providing mathematical proof that consideration of such groups is, in general, necessary in selecting the best employees or students.
It is almost an article of faith that race, sex, religion, national origin, or similar classifications (which will be referred to here as groups) are irrelevant for hiring, given a goal of selecting the best candidates. The standard wisdom is that those selecting for school admission or employment should devise an unbiased (in the statistical sense) procedure which predicts individual performance, evaluate individuals with this, and then select the highest ranked individuals. However, analysis shows that even with statistically unbiased evaluation procedures, group membership may still be relevant. If the goal is to pick the best individuals for jobs or training, membership in the group with the lower average performance (the disadvantaged group) should properly be held against the individual. In general, not considering group membership and selecting the best candidates are mutually exclusive.
…Related Psychometric Discussions: How does the conclusion reached above about the relevance of groups membership relate to discussions in the technical psychometric literature?
At least some psychometricians have been aware of the relevance of group membership. Hunter & Schmidt 1976 point out that differences in group means will typically lead to differences in intercepts. Jensen (1980, p. 94, Bias in Mental Testing) points out that the best estimate of true scores is obtained by regressing observed scores towards the mean, and that if there are 2 groups with different means, the downwards correction for the high scoring individuals will be greater for those from the low scoring group. Kelley (1947, p. 409, Fundamentals of Statistics) put it as follows: “This is an interesting equation in that it expresses the estimate of true ability as a weighted sum of 2 separate estimates, one based upon the individual’s observed score, X1, and the other based upon the mean of the group to which he belongs, M1. If the test is highly reliable, much weight is given to the test score and little to the group mean, and vice versa”, although he may not have been thinking of demographic groups. Cronbach, Gleser, Nanda, and Rajaratnam (1972, The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles) discuss the problem of deducing universe scores (essentially true scores in traditional terminology) from test data, recognizing that group means will be relevant. They even display an awareness that, since blacks normally score lower than whites, the logic of their reasoning calls for the use of higher cut-off scores for blacks than for whites (see p. 385). Mislevy (1993) also displays an awareness that group means are relevant, although he feels it would be unfair to use them.
In general, the relevance of group membership has been known to the specialist psychometric community, although few outside the community are aware of the effect. Thus, the contribution of Bayes’ theorem is to provide another demonstration, one that those outside the psychometric community may be more comfortable with.
1994-wright-subjectiveprobability.pdf: “Subjective Probability”, George Wright, Peter Ayton ( )
1991-ohagan.pdf: “Bayes-Hermite quadrature”, Andrew O'Hagan
1990-stigler.pdf: “The 1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators”, (1990-02-01; ):
This article examines Stein’s paradox from the perspective of an earlier century and shows that from that point of view the phenomenon is transparent. Furthermore, this earlier perspective leads to a relatively simple rigorous proof of Stein’s result, and the perspective can be extended to cover other situations, such as the simultaneous estimation of several Poisson means.
The relationship of this perspective to other earlier work, including the empirical Bayes approach, is also discussed.
[Keywords: admissibility, Empirical Bayes, James-Stein estimation, Poisson distribution, regression, Stein paradox]
1984-norton.pdf: “The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator”, Robert M. Norton
This study has four purposes: to provide a comparison of discrimination methods; to explore the problems presented by techniques based strongly on Bayes’ theorem when they are used in a data analysis of large scale; to solve the authorship question of The Federalist papers; and to propose routine methods for solving other authorship problems.
Word counts are the variables used for discrimination. Since the topic written about heavily influences the rate with which a word is used, care in selection of words is necessary. The filler words of the language such as ‘an’, ‘of’, and ‘upon’, and, more generally, articles, prepositions, and conjunctions provide fairly stable rates, whereas more meaningful words like ‘war’, ‘executive’, and ‘legislature’ do not.
After an investigation of the distribution of these counts, the authors execute an analysis employing the usual discriminant function and an analysis based on Bayesian methods. The conclusions about the authorship problem are that Madison rather than Hamilton wrote all 12 of the disputed papers.
The findings about methods are presented in the closing section on conclusions.
This report, summarizing and abbreviating a forthcoming monograph , gives some of the results but very little of their empirical and theoretical foundation. It treats two of the four main studies presented in the monograph, and none of the side studies.
1950-good-probabilityandtheweighingofevidence.pdf: “Probability and the Weighing of Evidence”, I. J. Good ( )
1922-ramsey.pdf: “Mr Keynes on Probability [review of J. M. Keynes, _A Treatise on Probability_ 1921]”, Frank P. Ramsey ( )
1814-laplace-philosophicalessayonprobabilities-ch5probabilitiestestimonies.pdf: “Philosophical Essay on Probabilities, Chapter 11: Concerning the Probabilities of Testimonies”, (1814; ):
The majority of our opinions being founded on the probability of proofs it is indeed important to submit it to calculus. Things it is true often become impossible by the difficulty of appreciating the veracity of witnesses and by the great number of circumstances which accompany the deeds they attest; but one is able in several cases to resolve the problems which have much analogy with the questions which are proposed and whose solutions may be regarded as suitable approximations to guide and to defend us against the errors and the dangers of false reasoning to which we are exposed. An approximation of this kind, when it is well made, is always preferable to the most specious reasonings.
We would give no credence to the testimony of a man who should attest to us that in throwing a hundred dice into the air they had all fallen on the same face. If we had ourselves been spectators of this event we should believe our own eyes only after having carefully examined all the circumstances, and after having brought in the testimonies of other eyes in order to be quite sure that there had been neither hallucination nor deception. But after this examination we should not hesitate to admit it in spite of its extreme improbability; and no one would be tempted, in order to explain it, to recur to a denial of the laws of vision. We ought to conclude from it that the probability of the constancy of the laws of nature is for us greater than this, that the event in question has not taken place at all a probability greater than that of the majority of historical facts which we regard as incontestable. One may judge by this the immense weight of testimonies necessary to admit a suspension of natural laws, and how improper it would be to apply to this case the ordinary rules of criticism. All those who without offering this immensity of testimonies support this when making recitals of events contrary to those laws, decrease rather than augment the belief which they wish to inspire; for then those recitals render very probable the error or the falsehood of their authors. But that which diminishes the belief of educated men increases often that of the uneducated, always greedy for the wonderful.
The action of time enfeebles then, without ceasing, the probability of historical facts just as it changes the most durable monuments. One can indeed diminish it by multiplying and conserving the testimonies and the monuments which support them. Printing offers for this purpose a great means, unfortunately unknown to the ancients. In spite of the infinite advantages which it procures the physical and moral revolutions by which the surface of this globe will always be agitated will end, in conjunction with the inevitable effect of time, by rendering doubtful after thousands of years the historical facts regarded to-day as the most certain.