 See Also

Links
 “A Systematic Review of Human Challenge Trials, Designs, and Safety”, AdamsPhipps et al 2022
 “The InterModel Vigorish (IMV): A Flexible and Portable Approach for Quantifying Predictive Accuracy With Binary Outcomes”, Domingue et al 2022
 “The Geometry of Decisionmaking in Individuals and Collectives”, Sridhar et al 2021
 “Noise Increases Anchoring Effects”, Lee & Morewedge 2021
 “Prior Knowledge Elicitation: The Past, Present, and Future”, Mikkola et al 2021
 “ΝSDDP: Neural Stochastic Dual Dynamic Programming”, Dai et al 2021
 “A Rational Reinterpretation of Dualprocess Theories”, Milli et al 2021
 “Strategically Overconfident (to a Fault): How Selfpromotion Motivates Advisor Confidence”, Zant 2021
 “TV Advertising Effectiveness and Profitability: Generalizable Results From 288 Brands”, Shapiro et al 2021
 “Learning to Hesitate”, Descamps et al 2021
 “Informational Herding, Optimal Experimentation, and Contrarianism”, Smith et al 2021
 “Adversarial Vulnerabilities of Human Decisionmaking”, Dezfouli et al 2020
 “Targeting for Longterm Outcomes”, Yang et al 2020
 “Learning Not to Learn: Nature versus Nurture in Silico”, Lange & Sprekeler 2020
 “Robust Decision Theory and Econometrics”, Chamberlain 2020
 “Speedaccuracy Tradeoff in Plants”, Ceccarini et al 2020
 “The Secret History of Facial Recognition: Sixty Years Ago, a Sharecropper’s Son Invented a Technology to Identify Faces. Then the Record of His Role All but Vanished. Who Was Woody Bledsoe, and Who Was He Working For?”, Raviv 2020
 “A/B Testing With Fat Tails”, Azevedo et al 2019
 “Bayesian Persuasion and Information Design”, Kamenica 2019
 “Generalizable and Robust TV Advertising Effects”, Shapiro et al 2019
 “How Should We Critique Research?”, Branwen 2019
 “Metalearning of Sequential Strategies”, Ortega et al 2019
 “Is the FDA Too Conservative or Too Aggressive?: A Bayesian Decision Analysis of Clinical Trial Design”, Isakov et al 2019
 “Using the Results from Rigorous Multisite Evaluations to Inform Local Policy Decisions”, Orr et al 2019
 “Accounting Theory As a Bayesian Discipline”, Johnstone 2018
 “Evolution As Backstop for Reinforcement Learning”, Branwen 2018
 “Dog Cloning For Special Forces: Breed All You Can Breed”, Branwen 2018
 “Improving Widthbased Planning With Compact Policies”, Junyent et al 2018
 “How to Train Your Oracle: The Delphi Method and Its Turbulent Youth in Operations Research and the Policy Sciences”, Dayé 2018
 “PHacking and False Discovery in A/B Testing”, Berman et al 2018
 “On Having Enough Socks”, Branwen 2017
 “An Analysis of the Value of Information When Exploring Stochastic, Discrete MultiArmed Bandits”, Sledge & Principe 2017
 “Toward a Rational and Mechanistic Account of Mental Effort”, Shenhav et al 2017
 “Pricing the Future in the 17^{th} Century: Calculating Technologies in Competition”, Deringer 2017
 “Neural Combinatorial Optimization With Reinforcement Learning”, Bello et al 2017
 “SelfBlinded Mineral Water Taste Test”, Branwen 2017
 “The Kelly CoinFlipping Game: Exact Solutions”, Branwen et al 2017
 “Banner Ads Considered Harmful”, Branwen 2017
 “The Risk Elicitation Puzzle”, Pedroni et al 2017
 “Was Angelina Jolie Right? Optimizing Cancer Prevention Strategies Among BRCA Mutation Carriers”, Nohdurft et al 2017
 “Internet WiFi Improvement”, Branwen 2016
 “Why Tool AIs Want to Be Agent AIs”, Branwen 2016
 “Candy Japan’s New Box A/B Test”, Branwen 2016
 “Embryo Selection For Intelligence”, Branwen 2016
 “Bitter Melon for Blood Glucose”, Branwen 2015
 “Deep DPG (DDPG): Continuous Control With Deep Reinforcement Learning”, Lillicrap et al 2015
 “The Unfavorable Economics of Measuring the Returns to Advertising”, Lewis & Rao 2015
 “When Should I Check The Mail?”, Branwen 2015
 “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, Ioffe & Szegedy 2015
 “Selectiongain: an R Package for Optimizing Multistage Selection”, Mi et al 2015
 “Focusing on the Longterm: It’s Good for Users and Business”, Hohnhold et al 2015
 “Thompson Sampling With the Online Bootstrap”, Eckles & Kaptein 2014
 “Statistical Notes”, Branwen 2014
 “Playing Atari With Deep Reinforcement Learning”, Mnih et al 2013
 “On the Near Impossibility of Measuring the Returns to Advertising”, Lewis & Rao 2013
 “Caffeine Wakeup Experiment”, Branwen 2013
 “Experimental Design for Partially Observed Markov Decision Processes”, Thorbergsson & Hooker 2012
 “Rerandomization to Improve Covariate Balance in Experiments”, Morgan & Rubin 2012
 “Timing Technology: Lessons From The Media Lab”, Branwen 2012
 “A/B Testing Longform Readability on Gwern.net”, Branwen 2012
 “Redshift Sleep Experiment”, Branwen 2012
 “Learning Is Planning: near Bayesoptimal Reinforcement Learning via MonteCarlo Tree Search”, Asmuth & Littman 2012
 “Why Philosophers Should Care About Computational Complexity”, Aaronson 2011
 “Does Retail Advertising Work? Measuring the Effects of Advertising on Sales Via a Controlled Experiment on Yahoo!”, Lewis & Reiley 2011
 “PILCO: A ModelBased and DataEfficient Approach to Policy Search”, Deisenroth & Rasmussen 2011
 “Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising”, Lewis et al 2011
 “Improving Vineyard Sampling Efficiency via Dynamic Spatially Explicit Optimisation”, Meyers et al 2011
 “The Time Resolution of the St Petersburg Paradox”, Peters 2011
 “How to Improve R&D Productivity: the Pharmaceutical Industry's Grand Challenge”, Paul et al 2010
 “Drug Harms in the UK: a Multicriteria Decision Analysis”, Nutt et al 2010
 “Adversarial Risk Analysis”, Insua et al 2009
 “Retrospectives Guinnessometrics: The Economic Foundation of “Student’s” T”, Ziliak 2008
 “The Guidelines Manual  Chapter 8: Incorporating Health Economics in Guidelines and Assessing Resource Impact”, NICE 2007
 “On the Evolution of Investment Strategies and the Kelly Rule—A Darwinian Approach”, Lensberg & SchenkHoppé 2007
 “Information Systems Project Continuation in Escalation Situations: A Real Options Model”, Tiwana et al 2006
 “Decision by Sampling”, Stewart et al 2006
 “The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis”, Smith & Winkler 2006
 “Investing in the Unknown and Unknowable”, Zeckhauser 2006
 “The Kelly Criterion in Blackjack Sports Betting, and the Stock Market”, Thorp 2006
 “Good and Real: Demystifying Paradoxes from Physics to Ethics”, Drescher 2006
 “Policy Mining: Learning Decision Policies from Fixed Sets of Data”, Zadrozny 2003
 “John W. Tukey: His Life and Professional Contributions”, Brillinger 2002
 “Stigler’s Diet Problem Revisited”, Garille & Gass 2001
 “Should We Take Measurements at an Intermediate Design Point?”, Gelman 2000
 “Comparing Classifiers When the Misallocation Costs Are Uncertain”, Adams & Hand 1999
 “Adding Risks: Samuelson's Fallacy of Large Numbers Revisited”, Ross 1999
 “Information Theory and an Extension of the Maximum Likelihood Principle”, Akaike 1998
 “‘Improving Ratings’: Audit in the British University System”, Strathern 1997
 “The 'awful Idea of Accountability': Inscribing People into the Measurement of Objects”, Hoskin 1996
 “Seeing The Forest From The Trees: When Predicting The Behavior Or Status Of Groups, Correlate Means”, Lubinski & Humphreys 1996b
 “Processing Linguistic Probabilities: General Principles and Empirical Evidence”, Budescu & Wallsten 1995
 “Computer Based Horse Race Handicapping and Wagering Systems: A Report”, Hausch et al 1994
 “Bayesian Updating in Hierarchic Markov Processes Applied to the Animal Replacement Problem”, Kristensen 1993
 “Learning from Coarse Information: Biased Contests and Career Profiles”, Meyer 1991
 “Weight or the Value of Knowledge”, Ramsey 1990
 “'Student': A Statistical Biography of William Sealy Gosset”, Pearson et al 1990
 “F. P. Ramsey: Philosophical Papers”, Ramsey & Mellor 1990
 “The Total Evidence Theorem for Probability Kinematics”, Graves 1989
 “Nonlinear Preference and Utility Theory”, Fishburn 1988
 “Measuring the Vague Meanings of Probability Terms”, Wallsten et al 1986
 “An Examination of Two Alternative Techniques to Estimate the Standard Deviation of Job Performance in Dollars”, Reilly & Smither 1985
 “Game Theoretic Analysis of a Bankruptcy Problem from the Talmud”, Aumann & Maschler 1985
 “Influence Diagrams”, Howard & Matheson 1984
 “The Citation Bias: Fad and Fashion in the Judgment and Decision Literature”, ChristensenSzalanski & Beach 1984
 “Readings on the Principles and Applications of Decision Analysis: Volume 2: Professional Collection”, Howard & Matheson 1983
 “Readings on the Principles and Applications of Decision Analysis: Volume 1: General Collection”, Howard & Matheson 1983
 “MultiBayesian Statistical Decision Theory”, Weerahandi & Zidek 1981
 “Impact of Valid Selection Procedures on Workforce Productivity”, Schmidt et al 1979
 “Science and Statistics”, Box 1976
 “When Values Conflict: Essays on Environmental Analysis, Discourse, and Decision”, Tribe et al 1976
 “Boundaries of Analysis: An Inquiry into the Tocks Island Dam Controversy”, Feiveson et al 1976
 “Portfolio Choice and the Kelly Criterion”, Thorp 1975
 “CrossModality Matching of Money Against Other Continua”, Galanter & Pliner 1974
 “The General Impossibility of Normative Accounting Standards”, Demski 1973
 “The Theory of Social Choice”, Fishburn 1973
 “What Makes for a Beautiful Problem in Science?”, Samuelson 1970
 “General Proof That Diversification Pays”, Samuelson 1967
 “Optimal Dairy Cow Replacement Policies”, Giaever 1966
 “Measuring Utility by a Singleresponse Sequential Method”, Becker et al 1964
 “A Model for Selecting One of Two Medical Treatments”, Colton 1963
 “Studies of War, Nuclear and Conventional”, Blackett 1962
 “Applied Statistical Decision Theory”, Raiffa & Schlaifer 1961
 “Gradient Theory of Optimal Flight Paths”, Kelley 1960
 “Testing Statistical Hypotheses (First Edition)”, Lehmann 1959
 “Probability and Statistics for Business Decisions: An Introduction to Managerial Economics Under Uncertainty”, Schlaifer 1959
 “An Optimum Character Recognition System Using Decision Functions”, Chow 1957
 “Unsolved Problems of Experimental Statistics”, Tukey 1954
 “NonCooperative Games”, Nash 1951
 “The Economic Life of Industrial Equipment”, Preinreich 1940
 “"Student" As Statistician”, Pearson 1939
 “Presidential Address to the First Indian Statistical Congress”, Fisher 1938
 “The Lanarkshire Milk Experiment”, Elderton 1933
 “Pasteurised and Raw Milk”, Fisher & Bartlett 1931
 “On Testing Varieties of Cereals”, Gosset 1923
 ThueMorse sequence
 Thompson sampling
 Optimal stopping
 Multiobjective optimization
 Monte Carlo tree search
 Miscellaneous
See Also
Links
“A Systematic Review of Human Challenge Trials, Designs, and Safety”, AdamsPhipps et al 2022
“A Systematic Review of Human Challenge Trials, Designs, and Safety”, (20220321; ; similar):
Background: There exists no prior systematic review of human challenge trials (HCTs) that focuses on participant safety. Key questions regarding HCTs include how risky such trials have been, how often adverse events (AEs) and serious adverse events (SAEs) occur, and whether risk mitigation measures have been effective.
Methods: A systematic search of PubMed and PubMed Central for articles reporting on results of HCTs published between 1980 and 2021 was performed and completed by 20211007.
Results: Of 2,838 articles screened, 276 were reviewed in full. 15,046 challenged participants were described in 308 studies that met inclusion criteria. 286 (92.9%) of these studies reported mitigation measures used to minimize risk to the challenge population. Among 187 studies which reported on SAEs, 0.2% of participants experienced at least one challengerelated SAE. Among 94 studies that graded AEs by severity, challengerelated AEs graded “severe” were reported by between 5.6% and 15.8% of participants. AE data were provided as a range to account for unclear reporting. 80% of studies published after 2010 were registered in a trials database.
Conclusion: HCTs are increasingly common and used for an expanding list of diseases. Although AEs occur, severe AEs and SAEs are rare. Reporting has improved over time, though not all papers provide a comprehensive report of relevant health impacts. From the available data, most HCTs do not lead to a high number of severe symptoms or SAEs.
This study was preregistered on PROSPERO as CRD42021247218.
“The InterModel Vigorish (IMV): A Flexible and Portable Approach for Quantifying Predictive Accuracy With Binary Outcomes”, Domingue et al 2022
“The InterModel Vigorish (IMV): A flexible and portable approach for quantifying predictive accuracy with binary outcomes”, (20220112; ; similar):
[Twitter; app] Understanding the “fit” of models designed to predict binary outcomes has been a longstanding problem.
We propose a flexible, portable, and intuitive metric for quantifying the change in accuracy between 2 predictive systems in the case of a binary outcome, the InterModel Vigorish (IMV). The IMV is based on an analogy to wellcharacterized physical systems with tractable probabilities: weighted coins. The IMV is always a statement about the change in fit relative to some baseline—which can be as simple as the prevalence—whereas other metrics are standalone measures that need to be further manipulated to yield indices related to differences in fit across models. Moreover, the IMV is consistently interpretable independent of baseline prevalence.
We illustrate the flexible properties of this metric in numerous simulations and showcase its flexibility across examples spanning the social, biomedical, and physical sciences.
[Keywords: binary outcomes, fit index, logistic regression, prediction, Kelly criterion, entropy, coherence]
“The Geometry of Decisionmaking in Individuals and Collectives”, Sridhar et al 2021
“The geometry of decisionmaking in individuals and collectives”, (20211214; ; similar):
Almost all animals must make decisions on the move. Here, employing an approach that integrates theory and highthroughput experiments (using stateoftheart virtual reality), we reveal that there exist fundamental geometrical principles that result from the inherent interplay between movement and organisms’ internal representation of space. Specifically, we find that animals spontaneously reduce the world into a series of sequential binary decisions, a response that facilitates effective decisionmaking and is robust both to the number of options available and to context, such as whether options are static (eg. refuges) or mobile (eg. other animals). We present evidence that these same principles, hitherto overlooked, apply across scales of biological organization, from individual to collective decisionmaking.
Choosing among spatially distributed options is a central challenge for animals, from deciding among alternative potential food sources or refuges to choosing with whom to associate. Using an integrated theoretical and experimental approach (employing immersive virtual reality), we consider the interplay between movement and vectorial integration during decisionmaking regarding 2, or more, options in space.
In computational models of this process, we reveal the occurrence of spontaneous and abrupt “critical” transitions (associated with specific geometrical relationships) whereby organisms spontaneously switch from averaging vectorial information among, to suddenly excluding one among, the remaining options. This bifurcation process repeats until only one option—the one ultimately selected—remains. Thus, we predict that the brain repeatedly breaks multichoice decisions into a series of binary decisions in spacetime.
Experiments with fruit flies, desert locusts, and larval zebrafish reveal that they exhibit these same bifurcations, demonstrating that across taxa and ecological contexts, there exist fundamental geometric principles that are essential to explain how, and why, animals move the way they do.
[Keywords: ring attractor, movement ecology, navigation, collective behavior, embodied choice]
“Noise Increases Anchoring Effects”, Lee & Morewedge 2021
2021lee.pdf
: “Noise Increases Anchoring Effects”, (20211208; ; similar):
We introduce a theoretical framework distinguishing between anchoring effects, anchoring bias, and judgmental noise: Anchoring effects require anchoring bias, but noise modulates their size. We tested this framework by manipulating stimulus magnitudes. As magnitudes increase, psychophysical noise due to scalar variability widens the perceived range of plausible values for the stimulus. This increased noise, in turn, increases the influence of anchoring bias on judgments. In 11 preregistered experiments (n = 3,552 adults), anchoring effects increased with stimulus magnitude for point estimates of familiar and novel stimuli (eg. reservation prices for hotels and donuts, counts in dot arrays). Comparisons of relevant and irrelevant anchors showed that noise itself did not produce anchoring effects. Noise amplified anchoring bias. Our findings identify a stimulus feature predicting the size and replicability of anchoring effects—stimulus magnitude. More broadly, we show how to use psychophysical noise to test relationships between bias and noise in judgment under uncertainty.
“Prior Knowledge Elicitation: The Past, Present, and Future”, Mikkola et al 2021
“Prior knowledge elicitation: The past, present, and future”, (20211201; ; similar):
Specification of the prior distribution for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. Prior elicitation transforms domain knowledge of various kinds into welldefined prior distributions, and offers a solution to the prior specification problem, in principle. In practice, however, we are still fairly far from having usable prior elicitation tools that could significantly influence the way we build probabilistic models in academia and industry. We lack elicitation methods that integrate well into the Bayesian workflow and perform elicitation efficiently in terms of costs of time and effort. We even lack a comprehensive theoretical framework for understanding different facets of the prior elicitation problem.
Why are we not widely using prior elicitation? We analyze the state of the art by identifying a range of key aspects of prior knowledge elicitation, from properties of the modelling task and the nature of the priors to the form of interaction with the expert. The existing prior elicitation literature is reviewed and categorized in these terms. This allows recognizing understudied directions in prior elicitation research, finally leading to a proposal of several new avenues to improve prior elicitation methodology.
“ΝSDDP: Neural Stochastic Dual Dynamic Programming”, Dai et al 2021
“νSDDP: Neural Stochastic Dual Dynamic Programming”, (20211201; ; similar):
Stochastic dual dynamic programming (SDDP) is a stateoftheart method for solving multistage stochastic optimization, widely used for modeling realworld process optimization tasks. Unfortunately, SDDP has a worstcase complexity that scales exponentially in the number of decision variables, which severely limits applicability to only low dimensional problems.
To overcome this limitation, we extend SDDP by introducing a trainable neural model that learns to map problem instances to a piecewise linear value function within intrinsic lowdimension space, which is architected specifically to interact with a base SDDP solver, so that can accelerate optimization performance on new instances. The proposed Neural Stochastic Dual Dynamic Programming (νSDDP) continually selfimproves by solving successive problems.
An empirical investigation demonstrates that νSDDP can substantially reduce problem solving cost without sacrificing solution quality over competitors such as SDDP and reinforcement learning algorithms, across a range of synthetic and realworld process optimization problems.
“A Rational Reinterpretation of Dualprocess Theories”, Milli et al 2021
2021milli.pdf
: “A rational reinterpretation of dualprocess theories”, (20211201; ; similar):
Highly influential “dualprocess” accounts of human cognition postulate the coexistence of a slow accurate system with a fast errorprone system. But why would there be just 2 systems rather than, say, one or 93?
Here, we argue that a dualprocess architecture might reflect a rational tradeoff between the cognitive flexibility afforded by multiple systems and the time and effort required to choose between them. We investigate what the optimal set and number of cognitive systems would depend on the structure of the environment.
We find that the optimal number of systems depends on the variability of the environment and the difficulty of deciding when which system should be used. Furthermore, we find that there is a plausible range of conditions under which it is optimal to be equipped with a fast system that performs no deliberation (“System 1”) and a slow system that achieves a higher expected accuracy through deliberation (“System 2”).
Our findings thereby suggest a rational reinterpretation of dualprocess theories.
[Keywords: bounded rationality, dualprocess theories, metadecision making, bounded optimality, metareasoning, resourcerationality]
…We study this problem in 4 different domains where the dual systems framework has been applied to explain human decisionmaking: binary choice, planning, strategic interaction, and multialternative, multiattribute risky choice. We investigate how the optimal cognitive architecture for each domain depends on the variability of the environment and the cost of choosing between multiple cognitive systems, which we call metareasoning cost.
“Strategically Overconfident (to a Fault): How Selfpromotion Motivates Advisor Confidence”, Zant 2021
2021vanzant.pdf
: “Strategically overconfident (to a fault): How selfpromotion motivates advisor confidence”, (20211101; ; similar):
Unlike judgments made in private, advice contexts invoke strategic social concerns that might increase overconfidence in advice. Many scholars have assumed that overconfident advice emerges as an adaptive response to advice seekers’ preference for confident advice and failure to punish overconfidence. However, another possibility is that advisors robustly display overconfidence as a selfpromotion tactic—even when it is punished by others.
Across 4 experiments and a survey of advice professionals, the current research finds support for this account. First, it shows that advisors express more overconfidence than private decisionmakers. This pattern held even after advice recipients punished advisors for their overconfidence. Second, it identifies the underlying motivations of advisors’ overconfidence. Advisors’ overconfidence was not driven by selfdeception or a sincere desire to be helpful. Instead, it reflected strategic selfpromotion.
Relative to the overconfidence revealed by their private beliefs, advisors purposely increased their overconfidence while broadcasting judgments when (a) it was salient that others would assess their competence and (b) looking competent served their selfinterest.
“TV Advertising Effectiveness and Profitability: Generalizable Results From 288 Brands”, Shapiro et al 2021
2021shapiro.pdf
: “TV Advertising Effectiveness and Profitability: Generalizable Results From 288 Brands”, (20210726; ; backlinks; similar):
We estimate the distribution of television advertising elasticities and the distribution of the advertising return on investment (ROI) for a large number of products in many categories…We construct a data set by merging market (DMA) level TV advertising data with retail sales and price data at the brand level…Our identification strategy is based on the institutions of the ad buying process.
Our results reveal substantially smaller advertising elasticities compared to the results documented in the literature, as well as a sizable percentage of statistically insignificant or negative estimates. The results are robust to functional form assumptions and are not driven by insufficient statistical power or measurement error.
The ROI analysis shows negative ROIs at the margin for more than 80% of brands, implying overinvestment in advertising by most firms. Further, the overall ROI of the observed advertising schedule is only positive for one third of all brands.
[Keywords: advertising, return on investment, empirical generalizations, agency issues, consumer packaged goods, media markets]
…We find that the mean and median of the distribution of estimated longrun ownadvertising elasticities are 0.023 and 0.014, respectively, and 2 thirds of the elasticity estimates are not statistically different from zero. These magnitudes are considerably smaller than the results in the extant literature. The results are robust to controls for own and competitor prices and feature and display advertising, and the advertising effect distributions are similar whether a carryover parameter is assumed or estimated. The estimates are also robust if we allow for a flexible functional form for the advertising effect, and they do not appear to be driven by measurement error. As we are not able to include all sensitivity checks in the paper, we created an interactive web application that allows the reader to explore all model specifications. The web application is available.
…First, the advertising elasticity estimates in the baseline specification are small. The median elasticity is 0.0140, and the mean is 0.0233. These averages are substantially smaller than the average elasticities reported in extant metaanalyses of published case studies (Assmus, Farley, and Lehmann (1984b), Sethuraman, Tellis, and Briesch (2011)). Second, 2 thirds of the estimates are not statistically distinguishable from zero. We show in Figure 2 that the most precise estimates are those closest to the mean and the least precise estimates are in the extremes.
…6.1 Average ROI of Advertising in a Given Week:
In the first policy experiment, we measure the ROI of the observed advertising levels (in all DMAs) in a given week t relative to not advertising in week t. For each brand, we compute the corresponding ROI for all weeks with positive advertising, and then average the ROIs across all weeks to compute the average ROI of weekly advertising. This metric reveals if, on the margin, firms choose the (approximately) correct advertising level or could increase profits by either increasing or decreasing advertising.
We provide key summary statistics in the top panel of Table III, and we show the distribution of the predicted ROIs in Figure 3(a). The average ROI of weekly advertising is negative for most brands over the whole range of assumed manufacturer margins. At a 30% margin, the median ROI is −88.15%, and only 12% of brands have positive ROI. Further, for only 3% of brands the ROI is positive and statistically different from zero, whereas for 68% of brands the ROI is negative and statistically different from zero.
These results provide strong evidence for overinvestment in advertising at the margin. [In Appendix C.3, we assess how much larger the TV advertising effects would need to be for the observed level of weekly advertising to be profitable. For the median brand with a positive estimated ad elasticity, the advertising effect would have to be 5.33× larger for the observed level of weekly advertising to yield a positive ROI (assuming a 30% margin).]
6.2 Overall ROI of the Observed Advertising Schedule: In the second policy experiment, we investigate if firms are better off when advertising at the observed levels versus not advertising at all. Hence, we calculate the ROI of the observed advertising schedule relative to a counterfactual baseline with zero advertising in all periods.
We present the results in the bottom panel of Table III and in Figure 3(b). At a 30% margin, the median ROI is −57.34%, and 34% of brands have a positive return from the observed advertising schedule versus not advertising at all. Whereas 12% of brands only have positive and 30% of brands only negative values in their confidence intervals, there is more uncertainty about the sign of the ROI for the remaining 58% of brands. This evidence leaves open the possibility that advertising may be valuable for a substantial number of brands, especially if they reduce advertising on the margin.
…Our results have important positive and normative implications. Why do firms spend billions of dollars on TV advertising each year if the return is negative? There are several possible explanations. First, agency issues, in particular career concerns, may lead managers (or consultants) to overstate the effectiveness of advertising if they expect to lose their jobs if their advertising campaigns are revealed to be unprofitable. Second, an incorrect prior (ie. conventional wisdom that advertising is typically effective) may lead a decision maker to rationally shrink the estimated advertising effect from their data to an incorrect, inflated prior mean. These proposed explanations are not mutually exclusive. In particular, agency issues may be exacerbated if the general effectiveness of advertising or a specific advertising effect estimate is overstated. [Another explanation is that many brands have objectives for advertising other than stimulating sales. This is a nonstandard objective in economic analysis, but nonetheless, we cannot rule it out.] While we cannot conclusively point to these explanations as the source of the documented overinvestment in advertising, our discussions with managers and industry insiders suggest that these may be contributing factors.
“Learning to Hesitate”, Descamps et al 2021
2021descamps.pdf
: “Learning to hesitate”, (20210622; similar):
We investigate how people make choices when they are unsure about the value of the options they face and have to decide whether to choose now or wait and acquire more information first.
In an experiment, we find that participants deviate from optimal information acquisition in a systematic manner. They acquire too much information (when they should only collect little) or not enough (when they should collect a lot). We show that this pattern can be explained as naturally emerging from Fechner cognitive errors. Over time participants tend to learn to approximate the optimal strategy when information is relatively costly.
[Keywords: search, decision under uncertainty, information, optimal stopping, real option]
…We design a controlled situation where individuals have to choose between 2 alternatives with uncertain payoffs. Before making a choice, they have the opportunity to wait and collect additional (costly) pieces of information which help them get a better idea of the likely alternatives’ payoffs. The design of the experiment allows us to precisely identify the optimal sequential sampling strategy and to assess whether participants are able to approximate it.
We find that participants deviate in systematic ways from the optimal strategy. They tend to hesitate too long and oversample information when it is relatively costly, and therefore when the optimal strategy is to collect only little information. On the contrary, they tend to undersample information when it is relatively cheap, and therefore when the optimal strategy is to collect a lot of information. We show that this pattern of oversampling and undersampling can be explained as the result of Fechner cognitive errors which introduce stochasticity in decisions about whether or not to stop. Cognitive errors create a risk to stop at any time by mistake. When the optimal level of information to acquire is high, DMs should continue to sample information for a long time. As a consequence, errors are likely to lead to stop too early, and therefore to undersampling. When the optimal level of evidence to acquire is low, DMs should stop sampling early. In that case, cognitive errors are more likely to lead to fail to stop early enough, and therefore to oversampling. The deviations we observe, lead participants to lose between 10 and 25% of their potential payoff. However, participants learn to get closer to the optimal strategy over time, as long as information is relatively costly.
“Informational Herding, Optimal Experimentation, and Contrarianism”, Smith et al 2021
2021smith.pdf
: “Informational Herding, Optimal Experimentation, and Contrarianism”, (20210225; ; similar):
In the standard herding model, privately informed individuals sequentially see prior actions and then act. An identical action herd eventually starts and public beliefs tend to “cascade sets” where social learning stops. What behaviour is socially efficient when actions ignore informational externalities?
We characterize the outcome that maximizes the discounted sum of utilities. Our 4 key findings are:
 cascade sets shrink but do not vanish, and herding should occur but less readily as greater weight is attached to posterity.
 An optimal mechanism rewards individuals mimicked by their successor.
 Cascades cannot start after period one under a signal logconcavity condition.
 Given this condition, efficient behaviour is contrarian, leaning against the myopically more popular actions in every period.
We make 2 technical contributions: as value functions with learning are not smooth, we use monotone comparative statics under uncertainty to deduce optimal dynamic behaviour. We also adapt dynamic pivot mechanisms to Bayesian learning.
[Keywords: herding, mimicking, contrarian, cascade, efficiency, monotonicity, logconcavity]
“Adversarial Vulnerabilities of Human Decisionmaking”, Dezfouli et al 2020
“Adversarial vulnerabilities of human decisionmaking”, (20201104; ; similar):
“What I cannot efficiently break, I cannot understand.” Understanding the vulnerabilities of human choice processes allows us to detect and potentially avoid adversarial attacks. We develop a general framework for creating adversaries for human decisionmaking. The framework is based on recent developments in deep reinforcement learning models and recurrent neural networks and can in principle be applied to any decisionmaking task and adversarial objective. We show the performance of the framework in 3 tasks involving choice, response inhibition, and social decisionmaking. In all of the cases the framework was successful in its adversarial attack. Furthermore, we show various ways to interpret the models to provide insights into the exploitability of human choice.
Adversarial examples are carefully crafted input patterns that are surprisingly poorly classified by artificial and/or natural neural networks. Here we examine adversarial vulnerabilities in the processes responsible for learning and choice in humans. Building upon recent recurrent neural network models of choice processes, we propose a general framework for generating adversarial opponents that can shape the choices of individuals in particular decisionmaking tasks toward the behavioral patterns desired by the adversary. We show the efficacy of the framework through 3 experiments involving action selection, response inhibition, and social decisionmaking. We further investigate the strategy used by the adversary in order to gain insights into the vulnerabilities of human choice. The framework may find applications across behavioral sciences in helping detect and avoid flawed choice.
[Keywords: decisionmaking, recurrent neural networks, reinforcement learning]
“Targeting for Longterm Outcomes”, Yang et al 2020
“Targeting for longterm outcomes”, (20201029; ; similar):
Decisionmakers often want to target interventions (eg. marketing campaigns) so as to maximize an outcome that is observed only in the longterm. This typically requires delaying decisions until the outcome is observed or relying on simple shortterm proxies for the longterm outcome. Here we build on the statistical surrogacy and offpolicy learning literature to impute the missing longterm outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doublyrobust approach.
We apply our approach in largescale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers to maximize their longterm revenue.
We first show that conditions for validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization.
We then validate this approach empirically by comparing it with a policy learned on the ground truth longterm outcomes and show that they are statistically indistinguishable. Our approach also outperforms a policy learned on shortterm proxies for the longterm outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for each new cohort of customers to account for potential nonstationarity.
Over 3 years, our approach had a netpositive revenue impact in the range of $4–$5 million compared to The Boston Globe’s current policies.
“Learning Not to Learn: Nature versus Nurture in Silico”, Lange & Sprekeler 2020
“Learning not to learn: Nature versus nurture in silico”, (20201009; ; backlinks; similar):
Animals are equipped with a rich innate repertoire of sensory, behavioral and motor skills, which allows them to interact with the world immediately after birth. At the same time, many behaviors are highly adaptive and can be tailored to specific environments by means of learning. In this work, we use mathematical analysis and the framework of metalearning (or ‘learning to learn’) to answer when it is beneficial to learn such an adaptive strategy and when to hardcode a heuristic behavior. We find that the interplay of ecological uncertainty, task complexity and the agents’ lifetime has crucial effects on the metalearned amortized Bayesian inference performed by an agent. There exist two regimes: One in which metalearning yields a learning algorithm that implements taskdependent informationintegration and a second regime in which metalearning imprints a heuristic or ‘hardcoded’ behavior. Further analysis reveals that nonadaptive behaviors are not only optimal for aspects of the environment that are stable across individuals, but also in situations where an adaptation to the environment would in fact be highly beneficial, but could not be done quickly enough to be exploited within the remaining lifetime. Hardcoded behaviors should hence not only be those that always work, but also those that are too complex to be learned within a reasonable time frame.
“Robust Decision Theory and Econometrics”, Chamberlain 2020
2020chamberlain.pdf
: “Robust Decision Theory and Econometrics”, (20200801; similar):
This review uses the empirical analysis of portfolio choice to illustrate econometric issues that arise in decision problems. Subjective expected utility (SEU) can provide normative guidance to an investor making a portfolio choice. The investor, however, may have doubts on the specification of the distribution and may seek a decision theory that is less sensitive to the specification. I consider three such theories: maxmin expected utility, variational preferences (including multiplier and divergence preferences and the associated constraint preferences), and smooth ambiguity preferences. I use a simple twoperiod model to illustrate their application. Normative empirical work on portfolio choice is mainly in the SEU framework, and bringing in ideas from robust decision theory may be fruitful.
“Speedaccuracy Tradeoff in Plants”, Ceccarini et al 2020
“Speedaccuracy tradeoff in plants”, (20200615; ; similar):
Speedaccuracy tradeoff (SAT) is the tendency for decision speed to covary with decision accuracy. SAT is an inescapable property of aimed movements being present in a wide range of species, from insects to primates. An aspect that remains unsolved is whether SAT extends to plants’ movement.
Here, we tested this possibility by examining the swaying in circles of the tips of shoots exhibited by climbing plants (Pisum sativum L.) as they approach to grasp a potential support. In particular, by means of 3dimensional kinematical analysis, we investigated whether climbing plants scale movement velocity as a function of the difficulty to coil a support.
Results showed that plants are able to process the properties of the support before contact and, similarly to animal species, strategically modulate movement velocity according to task difficulty.
…To date, a great absent in the Fitts’s law literature is the “green kingdom.” At first glance, plants seem relatively immobile, stuck to the ground in rigid structures and, unlike animals, unable to escape stressful environments. But, although markedly different from those of animals, movement pervades all aspects of plant behavior (Darwin & Darwin 1880). As observed by Darwin 1875, the tendrils of climbing plants undergo subtle movements around their axes of elongation. This elliptical movement, known as circumnutation, allows plants to explore their immediate surroundings in search, for instance, of a physical support to enhance light acquisition (Larson 2000). Also, Darwin (1875; see also Trewavas 2017) observed that the tendrils tend to assume the shape of whatever surface before they come into contact with. Implicitly this might signify that they “see” the support and plan the movement accordingly. In this view, climbing plants might be able to plan the course of an action ahead of time and program the tendrils’ choreography according to the “tobegrasped” object.
Support for this contention comes from both theoretical and empirical studies suggesting that plant movement is not a simple product of causeeffect mechanisms but rather seems to be driven by processes that are anticipatory in nature (eg. Calvo & Friston 2017; Guerra et al 2019). For instance, a recent study shows that a climbing plant (Pisum sativum L.) not only is able to perceive a potential support, but it also scales the kinematics of tendrils’ aperture according to its size well ahead they touch the stimulus (Guerra et al 2019). This has been taken as the demonstration that plants plan the movement purposefully and in ways that are flexible and anticipatory.
With this in mind, one of the empirical predictions stemming from Fitts’s law can be wellsuited to model the 3dimensional circumnutation of plants. Precisely, we refer to the evidence that movement time scales as a function of the target’s size: When the distance is constant, thinner targets are reached more slowly than thicker ones (see Murata & Iwase 2001). We test this prediction in Pisum sativum L. by assessing the change of velocity of the tendrils during their approachtograsp a thin or to a thicker support.
…Results…The analysis of movement time confirms this evidence, showing that movement time was shorter for the thinner than for the thicker stimulus (β < 0) with a probability of 79.3%. This evidence suggests that plants are able to process the properties of the support and are endowed with a form of perception underwriting a goaldirected and anticipatory behavior (Guerra et al 2019). However, in contrast with previous human and animal literature (eg. Beggs & Howarth 1972; Fitts 1954; Heitz & Schall 2012), our results indicate an opposite pattern of what Fitts’s law predicts. Remember that according to Fitts’s law, the velocity of the movement is inversely proportional to ID (2D/W). In other words, our results seem to suggest that plants exhibit more difficulty grasping a thicker than a thinner support. These findings are line with previous reports showing a lower success rate of attachment for thick supports (Peñalosa 1982), and a preference for plants to climb supports with a smaller diameter (Darwin 1875; Putz 1984; Putz & Holbrook 1992 [The Biology of Vines]). Furthermore, by using the curvature of tendrils during the twining phase, Goriely & Neukirch 2006 demonstrate that for thinner supports, the contact angle (ie.t, the angle between the tip of the tendril and the tangent of the support) is a nearzero value. Instead, with thicker supports, the contact angle tends to increase as tendrils must curl into the support’s surface to maintain an efficient grip. When the support is too thick, the contact angle increases to an extent that the tendril curls back on itself, losing grip. Interestingly, field studies in rainforests showed that the presence of climbing plants tends to decrease in areas in which there is a prevalence of thicker supports (CarrascoUrra & Gianoli 2009).
A possible explanation for this phenomenon may reside in the fact that, for plants, reaching to grasp thick supports is a more energy consuming process than grasping for thinner ones. Indeed, the grasping of a thick support implies that plants have to increase the tendril length in order to efficiently coil the support (Rowe et al 2006), and to strengthen the tensional forces to resist gravity (Gianoli 2015)
“The Secret History of Facial Recognition: Sixty Years Ago, a Sharecropper’s Son Invented a Technology to Identify Faces. Then the Record of His Role All but Vanished. Who Was Woody Bledsoe, and Who Was He Working For?”, Raviv 2020
“The Secret History of Facial Recognition: Sixty years ago, a sharecropper’s son invented a technology to identify faces. Then the record of his role all but vanished. Who was Woody Bledsoe, and who was he working for?”, (20200121; ; backlinks; similar):
Over the following year, Woody came to believe that the most promising path to automated facial recognition was one that reduced a face to a set of relationships between its major landmarks: eyes, ears, nose, eyebrows, lips. The system that he imagined was similar to one that Alphonse Bertillon, the French criminologist who invented the modern mug shot, had pioneered in 1879. Bertillon described people on the basis of 11 physical measurements, including the length of the left foot and the length from the elbow to the end of the middle finger. The idea was that, if you took enough measurements, every person was unique. Although the system was laborintensive, it worked: In 1897, years before fingerprinting became widespread, French gendarmes used it to identify the serial killer Joseph Vacher. Throughout 1965, Panoramic attempted to create a fully automated Bertillon system for the face. The team tried to devise a program that could locate noses, lips, and the like by parsing patterns of lightness and darkness in a photograph, but the effort was mostly a flop.
…Even with this larger sample size, though, Woody’s team struggled to overcome all the usual obstacles. The computer still had trouble with smiles, for instance, which “distort the face and drastically change interfacial measurements.” Aging remained a problem too, as Woody’s own face proved. When asked to crossmatch a photo of Woody from 1945 with one from 1965, the computer was flummoxed. It saw little resemblance between the younger man, with his toothy smile and dark widow’s peak, and the older one, with his grim expression and thinning hair. It was as if the decades had created a different person.
…In 1967, more than a year after his move to Austin, Woody took on one last assignment that involved recognizing patterns in the human face. The purpose of the experiment was to help law enforcement agencies quickly sift through databases of mug shots and portraits, looking for matches…Woody’s main collaborator on the project was Peter Hart, a research engineer in the Applied Physics Laboratory at the Stanford Research Institute. (Now known as SRI International, the institute split from Stanford University in 1970 because its heavy reliance on military funding had become so controversial on campus.) Woody and Hart began with a database of around 800 images—two newsprintquality photos each of about “400 adult male caucasians”, varying in age and head rotation. (I did not see images of women or people of color, or references to them, in any of Woody’s facialrecognition studies.) Using the RAND tablet, they recorded 46 coordinates per photo, including five on each ear, seven on the nose, and four on each eyebrow. Building on Woody’s earlier experience at normalizing variations in images, they used a mathematical equation to rotate each head into a forwardlooking position. Then, to account for differences in scale, they enlarged or reduced each image to a standard size, with the distance between the pupils as their anchor metric. The computer’s task was to memorize one version of each face and use it to identify the other. Woody and Hart offered the machine one of two shortcuts. With the first, known as group matching, the computer would divide the face into features—left eyebrow, right ear, and so on—and compare the relative distances between them. The second approach relied on Bayesian decision theory; it used 22 measurements to make an educated guess about the whole.
In the end, the two programs handled the task about equally well. More important, they blew their human competitors out of the water. When Woody and Hart asked three people to crossmatch subsets of 100 faces, even the fastest one took six hours to finish. The CDC 3800 computer completed a similar task in about three minutes, reaching a hundredfold reduction in time. The humans were better at coping with head rotation and poor photographic quality, Woody and Hart acknowledged, but the computer was “vastly superior” at tolerating the differences caused by aging. Overall, they concluded, the machine “dominates” or “very nearly dominates” the humans.
This was the greatest success Woody ever had with his facialrecognition research. It was also the last paper he would write on the subject. The paper was never made public—for “government reasons”, Hart says—which both men lamented. In 1970, two years after the collaboration with Hart ended, a roboticist named Michael Kassler alerted Woody to a facialrecognition study that Leon Harmon at Bell Labs was planning. “I’m irked that this second rate study will now be published and appear to be the best manmachine system available”, Woody replied. “It sounds to me like Leon, if he works hard, will be almost 10 years behind us by 1975.” He must have been frustrated when Harmon’s research made the cover of Scientific American a few years later, while his own, more advanced work was essentially kept in a vault.
“A/B Testing With Fat Tails”, Azevedo et al 2019
2019azevedo.pdf
: “A / B Testing with Fat Tails”, Eduardo M. Azevedo, Alex Deng, José Luis Montiel Olea, Justin M. Rao, E. Glen Weyl (20190809)
“Bayesian Persuasion and Information Design”, Kamenica 2019
2019kamenica.pdf
: “Bayesian Persuasion and Information Design”, (20190801; similar):
A school may improve its students’ job outcomes if it issues only coarse grades. Google can reduce congestion on roads by giving drivers noisy information about the state of traffic. A social planner might raise everyone’s welfare by providing only partial information about solvency of banks. All of this can happen even when everyone is fully rational and understands the datagenerating process. Each of these examples raises questions of what is the (socially or privately) optimal information that should be revealed. In this article, I review the literature that answers such questions.
“Generalizable and Robust TV Advertising Effects”, Shapiro et al 2019
2019shapiro.pdf
: “Generalizable and Robust TV Advertising Effects”, (20190611; ; backlinks; similar):
We provide generalizable and robust results on the causal sales effect of TV advertising based on the distribution of advertising elasticities for a large number of products (brands) in many categories. Such generalizable results provide a prior distribution that can improve the advertising decisions made by firms and the analysis and recommendations of antitrust and public policy makers. A single case study cannot provide generalizable results, and hence the marketing literature provides several metaanalyses based on published case studies of advertising effects. However, publication bias results if the research or review process systematically rejects estimates of small, statistically insignificant, or “unexpected” advertising elasticities. Consequently, if there is publication bias, the results of a metaanalysis will not reflect the true population distribution of advertising effects.
To provide generalizable results, we base our analysis on a large number of products and clearly lay out the research protocol used to select the products. We characterize the distribution of all estimates, irrespective of sign, size, or statisticalsignificance. To ensure generalizability we document the robustness of the estimates. First, we examine the sensitivity of the results to the approach and assumptions made when constructing the data used in estimation from the raw sources. Second, as we aim to provide causal estimates, we document if the estimated effects are sensitive to the identification strategies that we use to claim causality based on observational data. Our results reveal substantially smaller effects of ownadvertising compared to the results documented in the extant literature, as well as a sizable percentage of statistically insignificant or negative estimates. If we only select products with statisticallysignificant and positive estimates, the mean or median of the advertising effect distribution increases by a factor of about five.
The results are robust to various identifying assumptions, and are consistent with both publication bias and bias due to nonrobust identification strategies to obtain causal estimates in the literature.
[Keywords: advertising, publication bias, generalizability]
“How Should We Critique Research?”, Branwen 2019
Researchcriticism
: “How Should We Critique Research?”, (20190519; ; backlinks; similar):
Criticizing studies and statistics is hard in part because so many criticisms are possible, rendering them meaningless. What makes a good criticism is the chance of being a ‘difference which makes a difference’ to our ultimate actions.
Scientific and statistical research must be read with a critical eye to understand how credible the claims are. The Reproducibility Crisis and the growth of metascience have demonstrated that much research is of low quality and often false.
But there are so many possible things any given study could be criticized for, falling short of an unobtainable ideal, that it becomes unclear which possible criticism is important, and they may degenerate into mere rhetoric. How do we separate fatal flaws from unfortunate caveats from specious quibbling?
I offer a pragmatic criterion: what makes a criticism important is how much it could change a result if corrected and how much that would then change our decisions or actions: to what extent it is a “difference which makes a difference”.
This is why issues of research fraud, causal inference, or biases yielding overestimates are universally important: because a ‘causal’ effect turning out to be zero effect or grossly overestimated will change almost all decisions based on such research; while on the other hand, other issues like measurement error or distributional assumptions, which are equally common, are often not important: because they typically yield much smaller changes in conclusions, and hence decisions.
If we regularly ask whether a criticism would make this kind of difference, it will be clearer which ones are important criticisms, and which ones risk being rhetorical distractions and obstructing meaningful evaluation of research.
“Metalearning of Sequential Strategies”, Ortega et al 2019
“Metalearning of Sequential Strategies”, (20190508; ; backlinks; similar):
In this report we review memorybased metalearning as a tool for building sampleefficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building nearoptimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memorybased metalearning within a Bayesian framework, showing that the metalearned strategies are nearoptimal because they amortize Bayesfiltered data, where the adaptation is implemented in the memory dynamics as a statemachine of sufficient statistics. Essentially, memorybased metalearning translates the hard problem of probabilistic sequential inference into a regression problem.
“Is the FDA Too Conservative or Too Aggressive?: A Bayesian Decision Analysis of Clinical Trial Design”, Isakov et al 2019
2019isakov.pdf
: “Is the FDA too conservative or too aggressive?: A Bayesian decision analysis of clinical trial design”, (20190104; ; similar):
Implicit in the drugapproval process is a host of decisions—target patient population, control group, primary endpoint, sample size, followup period, etc.—all of which determine the tradeoff between Type I and Type II error. We explore the application of Bayesian decision analysis (BDA) to minimize the expected cost of drug approval, where the relative costs of the two types of errors are calibrated using U.S. Burden of Disease Study 2010 data. The results for conventional fixedsample randomized clinicaltrial designs suggest that for terminal illnesses with no existing therapies such as pancreatic cancer, the standard threshold of 2.5% is substantially more conservative than the BDAoptimal threshold of 23.9% to 27.8%. For relatively less deadly conditions such as prostate cancer, 2.5% is more risktolerant or aggressive than the BDAoptimal threshold of 1.2% to 1.5%. We compute BDAoptimal sizes for 25 of the most lethal diseases and show how a BDAinformed approval process can incorporate all stakeholders’ views in a systematic, transparent, internally consistent, and repeatable manner.
“Using the Results from Rigorous Multisite Evaluations to Inform Local Policy Decisions”, Orr et al 2019
2019orr.pdf
: “Using the Results from Rigorous Multisite Evaluations to Inform Local Policy Decisions”, Larry L. Orr, Robert B. Olsen, Stephen H. Bell, Ian Schmid, Azim Shivji, Elizabeth A. Stuart (20190101)
“Accounting Theory As a Bayesian Discipline”, Johnstone 2018
2018johnstone.pdf
: “Accounting Theory as a Bayesian Discipline”, (20181228; ; similar):
Accounting Theory as a Bayesian Discipline introduces Bayesian theory and its role in statistical accounting information theory. The Bayesian statistical logic of probability, evidence and decision lies at the historical and modern center of accounting thought and research. It is not only the presumed rule of reasoning in analytical models of accounting disclosure, it is the default position for empiricists when hypothesizing about how the users of financial statements think. Bayesian logic comes to light throughout accounting research and is the soul of most strategic disclosure models. In addition, Bayesianism is similarly a large part of the stated and unstated motivation of empirical studies of how market prices and their implied costs of capital react to better financial disclosure.
The approach taken in this monograph is a Demski 1973like treatment of “accounting numbers” as “signals” rather than as “measurements”. It should be of course that “good” measurements like “quality earnings” reports make generally better signals. However, to be useful for decision making under uncertainty, accounting measurements need to have more than established accounting measurement virtues. This monograph explains what those Bayesian information attributes are, where they come from in Bayesian theory, and how they apply in statistical accounting information theory.
The Bayesian logic of probability, evidence and decision is the presumed rule of reasoning in analytical models of accounting disclosure. Any rational explication of the decadesold accounting notions of “information content”, “value relevance”, “decision useful”, and possibly conservatism, is inevitably Bayesian. By raising some of the probability principles, paradoxes and surprises in Bayesian theory, intuition in accounting theory about information, and its value, can be tested and enhanced. Of all the branches of the social sciences, accounting information theory begs Bayesian insights.
This monograph lays out the main logical constructs and principles of Bayesianism, and relates them to important contributions in the theoretical accounting literature. The approach taken is essentially “oldfashioned” normative statistics, building on the expositions of Demski, Ijiri, Feltham and other early accounting theorists who brought Bayesian theory to accounting theory. Some history of this nexus, and the role of business schools in the development of Bayesian statistics in the 1950–1970s, is described. Later developments in accounting, especially noisy rational expectations models under which the information reported by firms is endogenous, rather than unaffected or “drawn from nature”, make the task of Bayesian inference more difficult yet no different in principle.
The information user must still revise beliefs based on what is reported. The extra complexity is that users must allow for the firm’s perceived disclosure motives and other relevant background knowledge in their Bayesian models. A known strength of Bayesian modelling is that subjective considerations are admitted and formally incorporated. Allowances for perceived selfinterest or biased reporting, along with any other apparent signal defects or “information uncertainty”, are part and parcel of Bayesian information theory.
Introduction
Bayesianism Early in Accounting Theory
 Rise of Bayesian statistics
 Bayes in US business schools
 Early Bayesian accounting theorists
 Postscript
Survey of Bayesian Fundamentals
 All probability is subjective
 Inference comes first
 Bayesian learning
 No objective priors
 Independence is subjective
 No distinction between risk and uncertainty
 The likelihood function (ie. model)
 Sufficiency and the likelihood principle
 Coherence
 Coherent means no “Dutch book”
 Coherent is not necessarily accurate
 Accuracy is relative
 Odds form of Bayes theorem
 Data can’t speak for itself
 Ancillary information
 Nuisance parameters “integrate out”
 “Randomness” is subjective
 “Exchangeable” samples
 The Bayes factor
 Conditioning on all evidence
 Bayesian versus conventional inference
 Simpson’s paradox
 Data swamps prior
 Stable estimation
 Cromwell’s rule
 Decisions follow inference
 Inference, not estimation
 Calibration
 Economic scoring rules
 Market scoring rules
 Measures of information
 Ex ante versus ex post accuracy
 Sampling to forgone conclusion
 Predictive distributions
 Model averaging
 Definition of a subjectivist Bayesian
 What makes a Bayesian?
 Rise of Bayesianism in data science
Case Study: Using All the Evidence
 Interpreting “plevel ≤ α”
 Bayesian interpretation of frequentist reports
 A generic inference problem
Is Accounting Bayesian or Frequentist?
 2 Bayesian schools in accounting
 Markowitz, subjectivist Bayesian
 Characterization of information in accounting
 Why accounting literature emphasizes “precision”
 Bayesian description of information quality
 Likelihood function of earnings
 Capturing conditional conservatism
Decision Support Role of Accounting Information
 A formal Bayesian model
 Parallels with meteorology
 Bayesian fundamental analysis
Demski’s (1973) Impossibility Result
 Example: binary accounting signals
 Conservatism and the user’s risk aversion
Does Information Reduce Uncertainty
 Beaver’s (1968) prescription
 Bayesian basics
 Contrary views in accounting
 Bayesian roots in finance
 The general Bayesian law
 Rogers et al 2009
 Dye & Hughes 2018
 Why a Predictive Distribution?
 Limits to certainty
 Lewellen & Shanken 2002
 Neururer et al 2016
 Veronesi 1999
How Information Combines
 Combining 2 risky signals
Ex Ante Effect of Greater Risk/Uncertainty
 Risk adds to ex ante expected utility
 Implications for Bayesian decision analysis
 Volatility pumping
Ex Post Decision Outcomes: 1. Practical investment
 Economic Darwinism
 Bayesian Darwinian selection
 Good probability assessments
 Implications for accounting information
Information Uncertainty
 Bayesian definition of information uncertainty
 Bayesian treatment of information uncertainty
 Model risk as information risk
Conditioning Beliefs and the Cost of Capital
Numerical example
Interpretation: 14. Reliance on the NormalNormal Model
Intuitive counterexample
Appeal to the normalnormal model in accounting
Unknown variance, increasing after observation
Beyer 2009
Armstrong et al 2016
Bayesian Subjective Beta
 Core et al 2015
 Verrecchia 2001: Understated influence of the mean
 Decision analysis effect of the mean
Other Bayesian Points of Interest
 Accounting input in prediction models
 Earnings quality and accurate probability assessments
 Expected variance as a measure of information
 Information stays relevant
 Bayesian view of earnings management
 Numerator versus denominator news
 Mixtures of normals
 Information content
 Fundamental versus information risk
 When information adds to information asymmetry
 Value of independent information sources
 How might market probabilities behave?
 “Idiosyncratic” versus “undiversifiable” information
Conclusion
References
“Evolution As Backstop for Reinforcement Learning”, Branwen 2018
Backstop
: “Evolution as Backstop for Reinforcement Learning”, (20181206; ; backlinks; similar):
Markets/evolution as backstops/ground truths for reinforcement learning/optimization: on some connections between Coase’s theory of the firm/linear optimization/DRL/evolution/multicellular life/pain/Internet communities as multilevel optimization problems.
One defense of free markets notes the inability of nonmarket mechanisms to solve planning & optimization problems. This has difficulty with Coase’s paradox of the firm, and I note that the difficulty is increased by the fact that with improvements in computers, algorithms, and data, ever larger planning problems are solved. Expanding on some Cosma Shalizi comments, I suggest interpreting phenomenon as multilevel nested optimization paradigm: many systems can be usefully described as having two (or more) levels where a slow sampleinefficient but groundtruth ‘outer’ loss such as death, bankruptcy, or reproductive fitness, trains & constrains a fast sampleefficient but possibly misguided ‘inner’ loss which is used by learned mechanisms such as neural networks or linear programming group selection perspective. So, one reason for freemarket or evolutionary or Bayesian methods in general is that while poorer at planning/optimization in the short run, they have the advantage of simplicity and operating on groundtruth values, and serve as a constraint on the more sophisticated nonmarket mechanisms. I illustrate by discussing corporations, multicellular life, reinforcement learning & metalearning in AI, and pain in humans. This view suggests that are inherent balances between market/nonmarket mechanisms which reflect the relative advantages between a slow unbiased method and faster but potentially arbitrarily biased methods.
“Dog Cloning For Special Forces: Breed All You Can Breed”, Branwen 2018
Clone
: “Dog Cloning For Special Forces: Breed All You Can Breed”, (20180918; ; backlinks; similar):
Decision analysis of whether cloning the most elite Special Forces dogs is a profitable improvement over standard selection procedures. Unless training is extremely cheap or heritability is extremely low, dog cloning is hypothetically profitable.
Cloning is widely used in animal & plant breeding despite steep costs due to its advantages; more unusual recent applications include creating entire polo horse teams and reported trials of cloning in elite police/Special Forces war dogs. Given the cost of dog cloning, however, can this ever make more sense than standard screening methods for selecting from working dog breeds, or would the increase in successful dog training be too low under all reasonable models to turn a profit?
I model the question as one of expected cost per dog with the trait of successfully passing training, success in training being a dichotomous liability threshold with a polygenic genetic architecture; given the extreme level of selection possible in selecting the best among alreadyelite Special Forces dogs and a range of heritabilities, this predicts clones’ success probabilities. To approximate the relevant parameters, I look at some reported training costs and success rates for regular dog candidates, broad dog heritabilities, and the few current dog cloning case studies reported in the media.
Since none of the relevant parameters are known with confidence, I run the costbenefit equation for many hypothetical scenarios, and find that in a large fraction of them covering most plausible values, dog cloning would improve training yields enough to be profitable (in addition to its other advantages).
As further illustration of the usecase of screening for an extreme outcome based on a partial predictor, I consider the question of whether height PGSes could be used to screen the US population for people of NBA height, which turns out to be reasonably doable with current & future PGSes.
“Improving Widthbased Planning With Compact Policies”, Junyent et al 2018
“Improving widthbased planning with compact policies”, (20180615; ; similar):
Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach humanlevel performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the IteratedWidth (IW) planner, a state of the art planner that makes explicit use of the state representation to perform structured exploration. IW is able to scale up to problems independently of the size of the state space. From the stateactions visited by IW, the learning step estimates a compact policy, which in turn is used to guide the planning step. The type of exploration used by our method is radically different than the standard random exploration used in RL. We evaluate our method in simple problems where we show it to have superior performance than the stateoftheart reinforcement learning algorithms A2C and Alpha Zero. Finally, we present preliminary results in a subset of the Atari games suite.
“How to Train Your Oracle: The Delphi Method and Its Turbulent Youth in Operations Research and the Policy Sciences”, Dayé 2018
2018daye.pdf
: “How to train your oracle: The Delphi method and its turbulent youth in operations research and the policy sciences”, (2018; similar):
Delphi is a procedure that produces forecasts on technological and social developments. This article traces the history of Delphi’s development to the early 1950s, where a group of logicians and mathematicians working at the RAND Corporation carried out experiments to assess the predictive capacities of groups of experts. While Delphi now has a rather stable methodological shape, this was not so in its early years. The vision that Delphi’s creators had for their brainchild changed considerably. While they had initially seen it as a technique, a few years later they reconfigured it as a scientific method. After some more years, however, they conceived of Delphi as a tool. This turbulent youth of Delphi can be explained by parallel changes in the fields that were deemed relevant audiences for the technique, operations research and the policy sciences. While changing the shape of Delphi led to some success, it had severe, yet unrecognized methodological consequences. The core assumption of Delphi that the convergence of expert opinions observed over the iterative stages of the procedure can be interpreted as consensus, appears not to be justified for the third shape of Delphi as a tool that continues to be the most prominent one.
“PHacking and False Discovery in A/B Testing”, Berman et al 2018
2018berman.pdf
: “pHacking and False Discovery in A / B Testing”, Ron Berman, Leonid Pekelis, Aisling Scott, Christophe Van den Bulte (20180101; ; backlinks)
“On Having Enough Socks”, Branwen 2017
Socks
: “On Having Enough Socks”, (20171122; ; backlinks; similar):
Personal experience and surveys on running out of socks; discussion of socks as small example of human procrastination and irrationality, caused by lack of explicit deliberative thought where no natural triggers or habits exist.
After running out of socks one day, I reflected on how ordinary tasks get neglected. Anecdotally and in 3 online surveys, people report often not having enough socks, a problem which correlates with rarity of sock purchases and demographic variables, consistent with a neglect/procrastination interpretation: because there is no specific time or triggering factor to replenish a shrinking sock stockpile, it is easy to run out.
This reminds me of akrasia on minor tasks, ‘yak shaving’, and the nature of disaster in complex systems: lack of hard rules lets errors accumulate, without any ‘global’ understanding of the drift into disaster (or at least inefficiency). Humans on a smaller scale also ‘drift’ when they engage in System I reactive thinking & action for too long, resulting in cognitive biases. An example of drift is the generalized human failure to explore/experiment adequately, resulting in overly greedy exploitative behavior of the current local optimum. Grocery shopping provides a case study: despite large gains, most people do not explore, perhaps because there is no established routine or practice involving experimentation. Fixes for these things can be seen as ensuring that System II deliberative cognition is periodically invoked to review things at a global level, such as developing a habit of maximum exploration at first purchase of a food product, or annually reviewing possessions to note problems like a lack of socks.
While socks may be small things, they may reflect big things.
“An Analysis of the Value of Information When Exploring Stochastic, Discrete MultiArmed Bandits”, Sledge & Principe 2017
“An Analysis of the Value of Information when Exploring Stochastic, Discrete MultiArmed Bandits”, (20171008; ; similar):
In this paper, we propose an informationtheoretic exploration strategy for stochastic, discrete multiarmed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the tradeoff between policy information and obtainable rewards. High amounts of policy information are associated with explorationdominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulatedannealinglike update of this parameter, with a sufficiently fast cooling schedule, leads to an optimal regret that is logarithmic with respect to the number of episodes.
“Toward a Rational and Mechanistic Account of Mental Effort”, Shenhav et al 2017
2017shenhav.pdf
: “Toward a Rational and Mechanistic Account of Mental Effort”, (20170701; ; backlinks; similar):
In spite of its familiar phenomenology, the mechanistic basis for mental effort remains poorly understood. Although most researchers agree that mental effort is aversive and stems from limitations in our capacity to exercise cognitive control, it is unclear what gives rise to those limitations and why they result in an experience of control as costly. The presence of these control costs also raises further questions regarding how best to allocate mental effort to minimize those costs and maximize the attendant benefits. This review explores recent advances in computational modeling and empirical research aimed at addressing these questions at the level of psychological process and neural mechanism, examining both the limitations to mental effort exertion and how we manage those limited cognitive resources. We conclude by identifying remaining challenges for theoretical accounts of mental effort as well as possible applications of the available findings to understanding the causes of and potential solutions for apparent failures to exert the mental effort required of us.
[Keywords: motivation, cognitive control, decision making, reward, prefrontal cortex, executive function]
“Pricing the Future in the 17^{th} Century: Calculating Technologies in Competition”, Deringer 2017
“Pricing the Future in the 17^{th} Century: Calculating Technologies in Competition”, (201704; ; backlinks; similar):
Time is money. But how much? What is money in the future worth to you today? This question of “present value” arises in myriad economic activities, from valuing financial securities to real estate transactions to governmental costbenefit analysis—even the economics of climate change. In modern capitalist practice, one calculation offers the only “rational” way to answer: compoundinterest discounting. In the early modern period, though, economic actors used at least two alternative calculating technologies for thinking about present value, including a vernacular technique called years purchase and discounting by simple interest. All of these calculations had different strengths and affordances, and none was unquestionably better or more “rational” than the others at the time. The history of technology offers distinct resources for understanding such technological competitions, and thus for understanding the emergence of modern economic temporality.
“Neural Combinatorial Optimization With Reinforcement Learning”, Bello et al 2017
“Neural Combinatorial Optimization with Reinforcement Learning”, (20170217; ; backlinks; similar):
neural combinatorial optimization, reinforcement learning
We present a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent neural network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent neural network using a policy gradient method. Without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. These results, albeit still quite far from stateoftheart, give insights into how neural networks can be used as a general tool for tackling combinatorial optimization problems.
“SelfBlinded Mineral Water Taste Test”, Branwen 2017
Water
: “SelfBlinded Mineral Water Taste Test”, (20170215; ; backlinks; similar):
Blind randomized tastetest of mineral/distilled/tap waters using Bayesian bestarm finding; no large differences in preference.
The kind of water used in tea is claimed to make a difference in the flavor: mineral water being better than tap water or distilled water. However, mineral water is vastly more expensive than tap water.
To test the claim, I run a preliminary test of pure water to see if any water differences are detectable at all. Compared my tap water, 3 distilled water brands (Great Value, Nestle Pure Life, & Poland Spring), 1 osmosispurified brand (Aquafina), and 3 noncarbonated mineral water brands (Evian, Voss, & Fiji) in a series of n = 67 blinded randomized comparisons of water flavor. The comparisons are modeled using a BradleyTerry competitive model implemented in Stan; comparisons were chosen using an adaptive Bayesian bestarm sequential trial (racing) method designed to locate the besttasting water in the minimum number of samples by preferentially comparing the bestknown arm to potentially superior arms. Blinding & randomization are achieved by using a Lazy Susan to physically randomize two identical (but marked in a hidden spot) cups of water.
The final posterior distribution indicates that some differences between waters are likely to exist but are small & imprecisely estimated and of little practical concern.
“The Kelly CoinFlipping Game: Exact Solutions”, Branwen et al 2017
Coinflip
: “The Kelly CoinFlipping Game: Exact Solutions”, (20170119; ; backlinks; similar):
Decisiontheoretic analysis of how to optimally play Haghani & Dewey 2016’s 300round doubleornothing coinflipping game with an edge and ceiling better than using the Kelly Criterion. Computing and following an exact decision tree increases earnings by $6.6 over a modified KC.
Haghani & Dewey 2016 experiment with a doubleornothing coinflipping game where the player starts with $30.4[^\$25.0^~2016~]{.supsub} and has an edge of 60%, and can play 300 times, choosing how much to bet each time, winning up to a maximum ceiling of $303.8[^\$250.0^~2016~]{.supsub}. Most of their subjects fail to play well, earning an average $110.6[^\$91.0^~2016~]{.supsub}, compared to Haghani & Dewey 2016’s heuristic benchmark of ~$291.6[^\$240.0^~2016~]{.supsub} in winnings achievable using a modified Kelly Criterion as their strategy. The KC, however, is not optimal for this problem as it ignores the ceiling and limited number of plays.
We solve the problem of the value of optimal play exactly by using decision trees & dynamic programming for calculating the value function, with implementations in R, Haskell, and C. We also provide a closedform exact value formula in R & Python, several approximations using Monte Carlo/random forests/neural networks, visualizations of the value function, and a Python implementation of the game for the OpenAI Gym collection. We find that optimal play yields $246.61 on average (rather than ~$240), and so the human players actually earned only 36.8% of what was possible, losing $155.6 in potential profit. Comparing decision trees and the Kelly criterion for various horizons (bets left), the relative advantage of the decision tree strategy depends on the horizon: it is highest when the player can make few bets (at b = 23, with a difference of ~$36), and decreases with number of bets as more strategies hit the ceiling.
In the Kelly game, the maximum winnings, number of rounds, and edge are fixed; we describe a more difficult generalized version in which the 3 parameters are drawn from Pareto, normal, and beta distributions and are unknown to the player (who can use Bayesian inference to try to estimate them during play). Upper and lower bounds are estimated on the value of this game. In the variant of this game where subjects are not told the exact edge of 60%, a Bayesian decision tree approach shows that performance can closely approach that of the decision tree, with a penalty for 1 plausible prior of only $1. Two deep reinforcement learning agents, DQN & DDPG, are implemented but DQN fails to learn and DDPG doesn’t show acceptable performance, indicating better deep RL methods may be required to solve the generalized Kelly game.
“Banner Ads Considered Harmful”, Branwen 2017
Ads
: “Banner Ads Considered Harmful”, (20170108; ; backlinks; similar):
9 months of daily A/Btesting of Google AdSense banner ads on Gwern.net indicates banner ads decrease total traffic substantially, possibly due to spillover effects in reader engagement and resharing.
One source of complexity & JavaScript use on Gwern.net is the use of Google AdSense advertising to insert banner ads. In considering design & usability improvements, removing the banner ads comes up every time as a possibility, as readers do not like ads, but such removal comes at a revenue loss and it’s unclear whether the benefit outweighs the cost, suggesting I run an A/B experiment. However, ads might be expected to have broader effects on traffic than individual page reading times/bounce rates, affecting total site traffic instead through longterm effects on or spillover mechanisms between readers (eg. social media behavior), rendering the usual A/B testing method of perpageload/session randomization incorrect; instead it would be better to analyze total traffic as a timeseries experiment.
Design: A decision analysis of revenue vs readers yields an maximum acceptable total traffic loss of ~3%. Power analysis of historical Gwern.net traffic data demonstrates that the high autocorrelation yields low statistical power with standard tests & regressions but acceptable power with ARIMA models. I design a longterm Bayesian
ARIMA(4,0,1)
timeseries model in which an A/Btest running January–October 2017 in randomized paired 2day blocks of ads/noads uses clientlocal JS to determine whether to load & display ads, with total traffic data collected in Google Analytics & ad exposure data in Google AdSense. The A/B test ran from 20170101 to 20171015, affecting 288 days with collectively 380,140 pageviews in 251,164 sessions.Correcting for a flaw in the randomization, the final results yield a surprisingly large estimate of an expected traffic loss of −9.7% (driven by the subset of users without adblock), with an implied −14% traffic loss if all traffic were exposed to ads (95% credible interval: −13–16%), exceeding my decision threshold for disabling ads & strongly ruling out the possibility of acceptably small losses which might justify further experimentation.
Thus, banner ads on Gwern.net appear to be harmful and AdSense has been removed. If these results generalize to other blogs and personal websites, an important implication is that many websites may be harmed by their use of banner ad advertising without realizing it.
“The Risk Elicitation Puzzle”, Pedroni et al 2017
2017pedroni.pdf
: “The risk elicitation puzzle”, Andreas Pedroni, Renato Frey, Adrian Bruhin, Gilles Dutilh, Ralph Hertwig, Jamp#x000F6;rg Rieskamp (20170101)
“Was Angelina Jolie Right? Optimizing Cancer Prevention Strategies Among BRCA Mutation Carriers”, Nohdurft et al 2017
2017nohdurft.pdf
: “Was Angelina Jolie Right? Optimizing Cancer Prevention Strategies Among BRCA Mutation Carriers”, Eike Nohdurft, Elisa Long, Stefan Spinler (20170101)
“Internet WiFi Improvement”, Branwen 2016
WiFi
: “Internet WiFi improvement”, (20161020; ; backlinks; similar):
After putting up with slow glitchy WiFi Internet for years, I investigate improvements. Upgrading the router, switching to a highgain antenna, and installing a buried Ethernet cable all offer increasing speeds.
My laptop in my apartment receives Internet via a WiFi repeater to another house, yielding slow speeds and frequent glitches. I replaced the obsolete WiFi router and increased connection speeds somewhat but still inadequate. For a better solution, I used a directional antenna to connect directly to the new WiFi router, which, contrary to my expectations, yielded a ~6× increase in speed. Extensive benchmarking of all possible arrangements of laptops/dongles/repeaters/antennas/routers/positions shows that the antenna+router is inexpensive and near optimal speed, and that the only possible improvement would be a hardwired Ethernet line, which I installed a few weeks later after learning it was not as difficult as I thought it would be.
“Why Tool AIs Want to Be Agent AIs”, Branwen 2016
ToolAI
: “Why Tool AIs Want to Be Agent AIs”, (20160907; ; backlinks; similar):
AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcementlearning AIs (Agent AIs) who act on their own and metalearn, because all problems are reinforcementlearning problems.
Autonomous AI systems (Agent AIs) trained using reinforcement learning can do harm when they take wrong actions, especially superintelligent Agent AIs. One solution would be to eliminate their agency by not giving AIs the ability to take actions, confining them to purely informational or inferential tasks such as classification or prediction (Tool AIs), and have all actions be approved & executed by humans, giving equivalently superintelligent results without the risk.
I argue that this is not an effective solution for two major reasons. First, because Agent AIs will by definition be better at actions than Tool AIs, giving an economic advantage. Secondly, because Agent AIs will be better at inference & learning than Tool AIs, and this is inherently due to their greater agency: the same algorithms which learn how to perform actions can be used to select important datapoints to learn inference over, how long to learn, how to more efficiently execute inference, how to design themselves, how to optimize hyperparameters, how to make use of external resources such as longterm memories or external software or large databases or the Internet, and how best to acquire new data.
All of these actions will result in Agent AIs more intelligent than Tool AIs, in addition to their greater economic competitiveness. Thus, Tool AIs will be inferior to Agent AIs in both actions and intelligence, implying use of Tool AIs is an even more highly unstable equilibrium than previously argued, as users of Agent AIs will be able to outcompete them on two dimensions (and not just one).
“Candy Japan’s New Box A/B Test”, Branwen 2016
CandyJapan
: “Candy Japan’s new box A/B test”, (20160506; ; backlinks; similar):
Bayesian decisiontheoretic analysis of the effect of fancier packaging on subscription cancellations & optimal experiment design.
I analyze an A/B test from a mailorder company of two different kinds of box packaging from a Bayesian decisiontheory perspective, balancing posterior probability of improvements & greater profit against the cost of packaging & risk of worse results, finding that as the company’s analysis suggested, the new box is unlikely to be sufficiently better than the old. Calculating expected values of information shows that it is not worth experimenting on further, and that such fixedsample trials are unlikely to ever be costeffective for packaging improvements. However, adaptive experiments may be worthwhile.
“Embryo Selection For Intelligence”, Branwen 2016
Embryoselection
: “Embryo Selection For Intelligence”, (20160122; ; backlinks; similar):
A costbenefit analysis of the marginal cost of IVFbased embryo selection for intelligence and other traits with 20162017 stateoftheart
With genetic predictors of a phenotypic trait, it is possible to select embryos during an in vitro fertilization process to increase or decrease that trait. Extending the work of Shulman & Bostrom 2014/Hsu 2014, I consider the case of human intelligence using SNPbased genetic prediction, finding:
 a metaanalysis of GCTA results indicates that SNPs can explain >33% of variance in current intelligence scores, and >44% with betterquality phenotype testing
 this sets an upper bound on the effectiveness of SNPbased selection: a gain of 9 IQ points when selecting the top embryo out of 10
 the best 2016 polygenic score could achieve a gain of ~3 IQ points when selecting out of 10
 the marginal cost of embryo selection (assuming IVF is already being done) is modest, at $1,822.7[^\$1,500.0^~2016~]{.supsub} + $243.0[^\$200.0^~2016~]{.supsub} per embryo, with the sequencing cost projected to drop rapidly
 a model of the IVF process, incorporating number of extracted eggs, losses to abnormalities & vitrification & failed implantation & miscarriages from 2 real IVF patient populations, estimates feasible gains of 0.39 & 0.68 IQ points
 embryo selection is currently unprofitable (mean: $435.0[^\$358.0^~2016~]{.supsub}) in the USA under the lowest estimate of the value of an IQ point, but profitable under the highest (mean: $7,570.3[^\$6,230.0^~2016~]{.supsub}). The main constraints on selection profitability is the polygenic score; under the highest value, the NPV EVPI of a perfect SNP predictor is $29.2[^\$24.0^~2016~]{.supsub}b and the EVSI per education/SNP sample is $86.3[^\$71.0^~2016~]{.supsub}k
 under the worstcase estimate, selection can be made profitable with a better polygenic score, which would require n > 237,300 using education phenotype data (and much less using fluid intelligence measures)
 selection can be made more effective by selecting on multiple phenotype traits: considering an example using 7 traits (IQ/height/BMI/diabetes/ADHD/bipolar/schizophrenia), there is a factor gain over IQ alone; the outperformance of multiple selection remains after adjusting for genetic correlations & polygenic scores and using a broader set of 16 traits.
 Overview of Major Approaches
 FAQ: Frequently Asked Questions
 Embryo Selection CostEffectiveness
 Iterated Embryo Selection
 Cloning
 See Also
 External Links
 Appendix
 IQ / Income Bibliography
 The Genius Factory, Plotz 2005
 Kong Et Al 2017 Polygenic Score Decline Derivation
 The Bell Curve, Murray & Herrnstein 1994: Dysgenics Opportunity Cost
 Embryo Selection And Dynasties
 Polygenic Scores In Plink
 History of IES
 Glue Robbers: Sequencing Nobelists Using Collectible Letters
“Bitter Melon for Blood Glucose”, Branwen 2015
Melon
: “Bitter Melon for blood glucose”, (20150914; ; similar):
Analysis of whether bitter melon reduces blood glucose in one selfexperiment and utility of further selfexperimentation
I reanalyze a bittermelon/bloodglucose selfexperiment, finding a small effect of increasing blood glucose after correcting for temporal trends & daily variation, giving both frequentist & Bayesian analyses. I then analyze the selfexperiment from a subjective Bayesian decisiontheoretic perspective, cursorily estimating the costs of diabetes & benefits of intervention in order to estimate Value Of Information for the selfexperiment and the benefit of further selfexperimenting; I find that the expected value of more data (EVSI) is negative and further selfexperimenting would not be optimal compared to trying out other antidiabetes interventions.
“Deep DPG (DDPG): Continuous Control With Deep Reinforcement Learning”, Lillicrap et al 2015
“Deep DPG (DDPG): Continuous control with deep reinforcement learning”, (20150909; ; backlinks; similar):
We adapt the ideas underlying the success of Deep QLearning to the continuous action domain.
We present an actorcritic, modelfree algorithm based on the deterministic policy gradient that can operate over continuous action spaces.
Using the same learning algorithm, network architecture and hyperparameters, our DDPG algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swingup, dexterous manipulation [gripper/reacher], legged locomotion [Cheetah/walker] and car driving [TORCS]. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives.
We further demonstrate that for many of the tasks the algorithm can learn policies endtoend: directly from raw pixel inputs.
“The Unfavorable Economics of Measuring the Returns to Advertising”, Lewis & Rao 2015
2015lewis.pdf
: “The Unfavorable Economics of Measuring the Returns to Advertising”, (20150706; ; backlinks; similar):
25 large field experiments with major U.S. retailers and brokerages, most reaching millions of customers and collectively representing $3.53^{$2.80}_{2015} million in digital advertising expenditure, reveal that measuring the returns to advertising is difficult.
The median confidence interval on return on investment is over 100 percentage points wide. Detailed sales data show that relative to the per capita cost of the advertising, individuallevel sales are very volatile; a coefficient of variation of 10 is common. Hence, informative advertising experiments can easily require more than 10 million personweeks, making experiments costly and potentially infeasible for many firms.
Despite these unfavorable economics, randomized control trials represent progress by injecting new, unbiased information into the market. The inference challenges revealed in the field experiments also show that selection bias, due to the targeted nature of advertising, is a crippling concern for widely employed observational methods.
“When Should I Check The Mail?”, Branwen 2015
Maildelivery
: “When Should I Check The Mail?”, (20150621; ; backlinks; similar):
Bayesian decisiontheoretic analysis of local mail delivery times: modeling deliveries as survival analysis, model comparison, optimizing check times with a loss function, and optimal data collection.
Mail is delivered by the USPS mailman at a regular but not observed time; what is observed is whether the mail has been delivered at a time, yielding somewhatunusual “intervalcensored data”. I describe the problem of estimating when the mailman delivers, write a simulation of the datagenerating process, and demonstrate analysis of intervalcensored data in R using maximumlikelihood (survival analysis with Gaussian regression using
survival
library), MCMC (Bayesian model in JAGS), and likelihoodfree Bayesian inference (custom ABC, using the simulation). This allows estimation of the distribution of mail delivery times. I compare those estimates from the intervalcensored data with estimates from a (smaller) set of exact deliverytimes provided by USPS tracking & personal observation, using a multilevel model to deal with heterogeneity apparently due to a change in USPS routes/postmen. Finally, I define a loss function on mail checks, enabling: a choice of optimal time to check the mailbox to minimize loss (exploitation); optimal time to check to maximize information gain (exploration); Thompson sampling (balancing exploration & exploitation indefinitely), and estimates of the valueofinformation of another datapoint (to estimate when to stop exploration and start exploitation after a finite amount of data).
“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, Ioffe & Szegedy 2015
“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, (20150211; backlinks; similar):
Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training minibatch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a stateoftheart image classification model, Batch Normalization achieves the same accuracy with 14× fewer training steps, and beats the original model by a significant margin. Using an ensemble of batchnormalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
“Selectiongain: an R Package for Optimizing Multistage Selection”, Mi et al 2015
2015mi.pdf
: “Selectiongain: an R package for optimizing multistage selection”, Xuefei Mi, H. Friedrich Utz, Albrecht E. Melchinger (20150101; backlinks)
“Focusing on the Longterm: It’s Good for Users and Business”, Hohnhold et al 2015
2015hohnhold.pdf
: “Focusing on the Longterm: It’s Good for Users and Business”, (2015; ; backlinks; similar):
Over the past 10+ years, online companies large and small have adopted widespread A/B testing as a robust databased method for evaluating potential product improvements. In online experimentation, it is straightforward to measure the shortterm effect, ie. the impact observed during the experiment. However, the shortterm effect is not always predictive of the longterm effect, ie. the final impact once the product has fully launched and users have changed their behavior in response. Thus, the challenge is how to determine the longterm user impact while still being able to make decisions in a timely manner.
We tackle that challenge in this paper by first developing experiment methodology for quantifying longterm user learning. We then apply this methodology to ads shown on Google search, more specifically, to determine and quantify the drivers of ads blindness and sightedness, the phenomenon of users changing their inherent propensity to click on or interact with ads.
We use these results to create a model that uses metrics measurable in the shortterm to predict the longterm. We learn that user satisfaction is paramount: ads blindness and sightedness are driven by the quality of previously viewed or clicked ads, as measured by both ad relevance and landing page quality. Focusing on user satisfaction both ensures happier users but also makes business sense, as our results illustrate. We describe two major applications of our findings: a conceptual change to our search ads auction that further increased the importance of ads quality, and a 50% reduction of the ad load on Google’s mobile search interface.
The results presented in this paper are generalizable in two major ways. First, the methodology may be used to quantify user learning effects and to evaluate online experiments in contexts other than ads. Second, the ads blindness/sightedness results indicate that a focus on user satisfaction could help to reduce the ad load on the internet at large with longterm neutral, or even positive, business impact.
[Keywords: Controlled experiments; A/B testing; predictive modeling; overall evaluation criterion]
“Thompson Sampling With the Online Bootstrap”, Eckles & Kaptein 2014
“Thompson sampling with the online bootstrap”, (20141015; ; similar):
Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstrap Thompson sampling (BTS), a heuristic method for solving bandit problems which modifies Thompson sampling by replacing the posterior distribution used in Thompson sampling by a bootstrap distribution. We first explain BTS and show that the performance of BTS is competitive to Thompson sampling in the wellstudied Bernoulli bandit case. Subsequently, we detail why BTS using the online bootstrap is more scalable than regular Thompson sampling, and we show through simulation that BTS is more robust to a misspecified error distribution. BTS is an appealing modification of Thompson sampling, especially when samples from the posterior are otherwise not available or are costly.
“Statistical Notes”, Branwen 2014
Statisticalnotes
: “Statistical Notes”, (20140717; ; backlinks; similar):
Miscellaneous statistical stuff
Given two disagreeing polls, one small & imprecise but taken at facevalue, and the other large & precise but with a high chance of being totally mistaken, what is the right Bayesian model to update on these two datapoints? I give ABC and MCMC implementations of Bayesian inference on this problem and find that the posterior is bimodal with a mean estimate close to the large unreliable poll’s estimate but with wide credible intervals to cover the mode based on the small reliable poll’s estimate.
 Critiques
 “Someone Should Do Something”: Wishlist of Miscellaneous Project Ideas
 Estimating censored test scores
 The Traveling Gerontologist problem
 Bayes nets
 Genome sequencing costs
 Proposal: handcounting mobile app for more fluid group discussions
 Air conditioner replacement
 Some ways of dealing with measurement error
 Value of Information: clinical prediction instruments for suicide
 Bayesian Model Averaging
 Dealing with allornothing unreliability of data
 Dysgenics power analysis
 Power analysis for racial admixture studies of continuous variables
 Operating on an aneurysm
 The Power of Twins: Revisiting Student’s Scottish Milk Experiment Example
 RNN metadata for mimicking individual author style
 MCTS
 Candy Japan A / B test
 DeFriesFulker power analysis
 Inferring mean IQs from SMPY / TIP elite samples
 Genius Revisited: On the Value of High IQ Elementary Schools
 Great Scott! Personal Name Collisions and the Birthday Paradox
 Detecting fake (human) Markov chain bots
 Optimal Existential Risk Reduction Investment
 Model Criticism via Machine Learning
 Proportion of Important Thinkers by Global Region Over Time in Charles Murray’s Human Accomplishment
 Program for nonspacedrepetition review of past written materials for serendipity & rediscovery: Archive Revisiter
 On the value of new statistical methods
 Bayesian power analysis: probability of exact replication
 Expectations are not expected deviations and large number of variables are not large samples
 Oh Deer: Could Deer Evolve to Avoid Car Accidents?
 Evolution as Backstop for Reinforcement Learning
 Acne: a good Quantified Self topic
 Fermi calculations
 Selective Emigration and Personality Trait Change
 The Most Abandoned Books on GoodReads
“Playing Atari With Deep Reinforcement Learning”, Mnih et al 2013
“Playing Atari with Deep Reinforcement Learning”, (20131219; backlinks; similar):
We present the first deep learning model to successfully learn control policies directly from highdimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Qlearning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
“On the Near Impossibility of Measuring the Returns to Advertising”, Lewis & Rao 2013
“On the Near Impossibility of Measuring the Returns to Advertising”, (20130423; ; backlinks; similar):
Classical theories of the firm assume access to reliable signals to measure the causal impact of choice variables on profit.
For advertising expenditure we show, using 25 online field experiments (representing $3.69^{$2.80}_{2013} million) with major U.S. retailers and brokerages, that this assumption typically does not hold. Statistical evidence from the randomized trials is very weak because individuallevel sales are incredibly volatile relative to the per capita cost of a campaign—a “small” impact on a noisy dependent variable can generate positive returns.
A concise statistical argument shows that the required sample size for an experiment to generate sufficiently informative confidence intervals is typically in excess of ten million personweeks. This also implies that heterogeneity bias (or model misspecification) unaccounted for by observational methods only needs to explain a tiny fraction of the variation in sales to severely bias estimates.
The weak informational feedback means most firms cannot even approach profit maximization.
“Caffeine Wakeup Experiment”, Branwen 2013
Caffeine
: “Caffeine wakeup experiment”, (20130407; ; backlinks; similar):
Selfexperiment on whether consuming caffeine immediately upon waking results in less time in bed & higher productivity. The results indicate a small and uncertain effect.
One trick to combat morning sluggishness is to get caffeine extraearly by using caffeine pills shortly before or upon trying to get up. From 20132014 I ran a blinded & placebocontrolled randomized experiment measuring the effect of caffeine pills in the morning upon awakening time and daily productivity. The estimated effect is small and the posterior probability relatively low, but a decision analysis suggests that since caffeine pills are so cheap, it would be worthwhile to conduct another experiment; however, increasing Zeo equipment problems have made me hold off additional experiments indefinitely.
“Experimental Design for Partially Observed Markov Decision Processes”, Thorbergsson & Hooker 2012
“Experimental design for Partially Observed Markov Decision Processes”, (20120918; ; similar):
This paper deals with the question of how to most effectively conduct experiments in Partially Observed Markov Decision Processes so as to provide data that is most informative about a parameter of interest. Methods from Markov decision processes, especially dynamic programming, are introduced and then used in an algorithm to maximize a relevant Fisher Information. The algorithm is then applied to two POMDP examples. The methods developed can also be applied to stochastic dynamical systems, by suitable discretization, and we consequently show what control policies look like in the MorrisLecar Neuron model, and simulation results are presented. We discuss how parameter dependence within these methods can be dealt with by the use of priors, and develop tools to update control policies online. This is demonstrated in another stochastic dynamical system describing growth dynamics of DNA template in a PCR model.
“Rerandomization to Improve Covariate Balance in Experiments”, Morgan & Rubin 2012
2012morgan.pdf
: “Rerandomization to improve covariate balance in experiments”, (20120718; backlinks; similar):
Randomized experiments are the “gold standard” for estimating causal effects, yet often in practice, chance imbalances exist in covariate distributions between treatment groups. If covariate data are available before units are exposed to treatments, these chance imbalances can be mitigated by first checking covariate balance before the physical experiment takes place. Provided a precise definition of imbalance has been specified in advance, unbalanced randomizations can be discarded, followed by a rerandomization, and this process can continue until a randomization yielding balance according to the definition is achieved. By improving covariate balance, rerandomization provides more precise and trustworthy estimates of treatment effects.
[Keywords: randomization, treatment allocation, experimental design, clinical trial, causal effect, Mahalanobis distance, Hotelling’s T^{2}]
“Timing Technology: Lessons From The Media Lab”, Branwen 2012
Timing
: “Timing Technology: Lessons From The Media Lab”, (20120712; ; backlinks; similar):
Technological developments can be foreseen but the knowledge is largely useless because startups are inherently risky and require optimal timing. A more practical approach is to embrace uncertainty, taking a reinforcement learning perspective.
How do you time your startup? Technological forecasts are often surprisingly prescient in terms of predicting that something was possible & desirable and what they predict eventually happens; but they are far less successful at predicting the timing, and almost always fail, with the success (and riches) going to another.
Why is their knowledge so useless? Why are success and failure so intertwined in the tech industry? The right moment cannot be known exactly in advance, so attempts to forecast will typically be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launching a company, resulting in extreme variance in outcomes, even when the idea is good and the forecasts correct about the (eventual) outcome.
Progress can happen and can be foreseen long before, but the details and exact timing due to bottlenecks are too difficult to get right. Launching too early means failure, but being conservative & launching later is just as bad because regardless of forecasting, a good idea will draw overlyoptimistic researchers or entrepreneurs to it like moths to a flame: all get immolated but the one with the dumb luck to kiss the flame at the perfect instant, who then wins everything, at which point everyone can see that the optimal time is past. All major success stories overshadow their long list of predecessors who did the same thing, but got unlucky. The lesson of history is that for every lesson, there is an equal and opposite lesson. So, ideas can be divided into the overlyoptimistic & likely doomed, or the fait accompli. On an individual level, ideas are worthless because so many others have them too—‘multiple invention’ is the rule, and not the exception. Progress, then, depends on the ‘unreasonable man’.
This overall problem falls under the reinforcement learning paradigm, and successful approaches are analogous to Thompson sampling/posterior sampling: even an informed strategy can’t reliably beat random exploration which gradually shifts towards successful areas while continuing to take occasional long shots. Since people tend to systematically overexploit, how is this implemented? Apparently by individuals acting suboptimally on the personal level, but optimally on societal level by serving as random exploration.
A major benefit of R&D, then, is in laying fallow until the ‘ripe time’ when they can be immediately exploited in previouslyunpredictable ways; applied R&D or VC strategies should focus on maintaining diversity of investments, while continuing to flexibly revisit previous failures which forecasts indicate may have reached ‘ripe time’. This balances overall exploitation & exploration to progress as fast as possible, showing the usefulness of technological forecasting on a global level despite its uselessness to individuals.
“A/B Testing Longform Readability on Gwern.net”, Branwen 2012
ABtesting
: “A/B testing longform readability on Gwern.net”, (20120616; ; backlinks; similar):
A log of experiments done on the site design, intended to render pages more readable, focusing on the challenge of testing a static site, page width, fonts, plugins, and effects of advertising.
To gain some statistical & web development experience and to improve my readers’ experiences, I have been running a series of CSS A/B tests since June 2012. As expected, most do not show any meaningful difference.
 Background
 Problems with “conversion” metric
 ideas for testing
 Testing
 Resumption: ABalytics
maxwidth
redux Fonts
 Line height
 Null test
 Text & background color
 List symbol and fontsize
 Blockquote formatting
 Font size & ToC background
 Section header capitalization
 ToC formatting
 BeeLine Reader text highlighting
 Floating footnotes
 Indented paragraphs
 Sidebar elements
 Moving sidebar’s metadata into page
 CSE
 Banner Ad Effect on Total Traffic
 Deep reinforcement learning
 Indentation + LeftJustified Text
 Appendix
“Redshift Sleep Experiment”, Branwen 2012
Redshift
: “Redshift sleep experiment”, (20120509; ; backlinks; similar):
Selfexperiment on whether screentinting software such as Redshift/f.lux affect sleep times and sleep quality; Redshift lets me sleep earlier but doesn’t improve sleep quality.
I ran a randomized experiment with a free program (Redshift) which reddens screens at night to avoid tampering with melatonin secretion & the sleep from 2012–2013, measuring sleep changes with my Zeo. With 533 days of data, the main result is that Redshift causes me to go to sleep half an hour earlier but otherwise does not improve sleep quality.
“Learning Is Planning: near Bayesoptimal Reinforcement Learning via MonteCarlo Tree Search”, Asmuth & Littman 2012
“Learning is planning: near Bayesoptimal reinforcement learning via MonteCarlo tree search”, (20120214; ; similar):
Bayesoptimal behavior, while welldefined, is often difficult to achieve. Recent advances in the use of MonteCarlo tree search (MCTS) have shown that it is possible to act nearoptimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayesoptimal behavior in an unknown MDP is equivalent to optimal behavior in the known beliefspace MDP, although the size of this beliefspace MDP grows exponentially with the amount of history retained, and is potentially infinite. We show how an agent can use one particular MCTS algorithm, Forward Search Sparse Sampling (FSSS), in an efficient way to act nearly Bayesoptimally for all but a polynomial number of steps, assuming that FSSS can be used to act efficiently in any possible underlying MDP.
“Why Philosophers Should Care About Computational Complexity”, Aaronson 2011
“Why Philosophers Should Care About Computational Complexity”, (20110808; ; backlinks; similar):
One might think that, once we know something is computable, how efficiently it can be computed is a practical question with little further philosophical importance. In this essay, I offer a detailed case that one would be wrong. In particular, I argue that computational complexity theory—the field that studies the resources (such as time, space, and randomness) needed to solve computational problems—leads to new perspectives on the nature of mathematical knowledge, the strong AI debate, computationalism, the problem of logical omniscience, Hume’s problem of induction, Goodman’s grue riddle, the foundations of quantum mechanics, economic rationality, closed timelike curves, and several other topics of philosophical interest. I end by discussing aspects of complexity theory itself that could benefit from philosophical analysis.
“Does Retail Advertising Work? Measuring the Effects of Advertising on Sales Via a Controlled Experiment on Yahoo!”, Lewis & Reiley 2011
“Does Retail Advertising Work? Measuring the Effects of Advertising on Sales Via a Controlled Experiment on Yahoo!”, (20110608; ; backlinks; similar):
We measure the causal effects of online advertising on sales, using a randomized experiment performed in cooperation between Yahoo! and a major retailer.
After identifying over one million customers matched in the databases of the retailer and Yahoo!, we randomly assign them to treatment and control groups. We analyze individuallevel data on ad exposure and weekly purchases at this retailer, both online and in stores.
We find statisticallysignificant and economically substantial impacts of the advertising on sales. The treatment effect persists for weeks after the end of an advertising campaign, and the total effect on revenues is estimated to be more than seven times the retailer’s expenditure on advertising during the study. Additional results explore differences in the number of advertising impressions delivered to each individual, online and offline sales, and the effects of advertising on those who click the ads versus those who merely view them.
Statistical power calculations show that, due to the high variance of sales, our large number of observations brings us just to the frontier of being able to measure economically substantial effects of advertising.
We also demonstrate that without an experiment, using industrystandard methods based on endogenous crosssectional variation in advertising exposure, we would have obtained a wildly inaccurate estimate of advertising effectiveness.
“PILCO: A ModelBased and DataEfficient Approach to Policy Search”, Deisenroth & Rasmussen 2011
2011deisenroth.pdf
: “PILCO: A ModelBased and DataEfficient Approach to Policy Search”, (20110601; ; backlinks; similar):
In this paper, we introduce PILCO, a practical, dataefficient modelbased policy search method. PILCO reduces model bias, one of the key problems of modelbased reinforcement learning, in a principled way.
By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into longterm planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using stateoftheart approximate inference. Furthermore, policy gradients are computed analytically for policy improvement.
We report unprecedented learning efficiency on challenging and highdimensional control tasks.
[Remarkably, PILCO can learn your standard “Cartpole” task within just a few trials by carefully building a Bayesian Gaussian process model and picking the maximallyinformative experiments to run. Cartpole is quite difficult for a human, incidentally, there’s an installation of one in the SF Exploratorium, and I just had to try it out once I recognized it. (My sampleefficiency was not better than PILCO.)]
“Here, There, and Everywhere: Correlated Online Behaviors Can Lead to Overestimates of the Effects of Advertising”, Lewis et al 2011
2011lewis.pdf
: “Here, there, and everywhere: correlated online behaviors can lead to overestimates of the effects of advertising”, (201103; ; backlinks; similar):
Measuring the causal effects of online advertising (
adfx
) on user behavior is important to the health of the WWW publishing industry. In this paper, using three controlled experiments, we show that observational data frequently lead to incorrect estimates ofadfx
. The reason, which we label “activity bias”, comes from the surprising amount of timebased correlation between the myriad activities that users undertake online.In Experiment 1, users who are exposed to an ad on a given day are much more likely to engage in brandrelevant search queries as compared to their recent history for reasons that had nothing do with the advertisement. In Experiment 2, we show that activity bias occurs for page views across diverse websites. In Experiment 3, we track account signups at a competitor’s (of the advertiser) website and find that many more people signup on the day they saw an advertisement than on other days, but that the true “competitive effect” was minimal.
In all three experiments, exposure to a campaign signals doing “more of everything” in given period of time, making it difficult to find a suitable “matched control” using prior behavior. In such cases, the “match” is fundamentally different from the exposed group, and we show how and why observational methods lead to a massive overestimate of
adfx
in such circumstances.[Keywords: advertising effectiveness, browsing behavior, causal inference, field experiments, selection bias]
“Improving Vineyard Sampling Efficiency via Dynamic Spatially Explicit Optimisation”, Meyers et al 2011
2011meyers.pdf
: “Improving vineyard sampling efficiency via dynamic spatially explicit optimisation”, J. M. Meyers, G. L. Sacks, H. M. Van Es, J. E. Vanden Heuvel (20110101)
“The Time Resolution of the St Petersburg Paradox”, Peters 2011
“The time resolution of the St Petersburg paradox”, (2011; ):
A resolution of the St Petersburg paradox is presented. In contrast to the standard resolution, utility is not required. Instead, the timeaverage performance of the lottery is computed. The final result can be phrased mathematically identically to Daniel Bernoulli’s resolution, which uses logarithmic utility, but is derived using a conceptually different argument. The advantage of the time resolution is the elimination of arbitrary utility functions.
“How to Improve R&D Productivity: the Pharmaceutical Industry's Grand Challenge”, Paul et al 2010
2010paul.pdf
: “How to improve R&D productivity: the pharmaceutical industry's grand challenge”, (20100219; backlinks; similar):
 The biopharmaceutical industry is facing unprecedented challenges to its fundamental business model and currently cannot sustain sufficient innovation to replace its products and revenues lost due to patent expirations.
 The number of truly innovative new medicines approved by regulatory agencies such as the US Food and Drug Administration has declined substantially despite continued increases in R&D spending, raising the current cost of each new molecular entity (NME) to ~US$2.49^{$1.80}_{2010} billion
 Declining R&D productivity is arguably the most important challenge the industry faces and thus improving R&D productivity is its most important priority.
 A detailed analysis of the key elements that determine overall R&D productivity and the cost to successfully develop an NME reveals exactly where (and to what degree) R&D productivity can (and must) be improved.
 Reducing latestage (Phase II and III) attrition rates and cycle times during drug development are among the key requirements for improving R&D productivity.
 To achieve the necessary increase in R&D productivity, R&D investments, both financial and intellectual, must be focused on the ‘sweet spot’ of drug discovery and early clinical development, from target selection to clinical proofofconcept.
 The transformation from a traditional biopharmaceutical FIPCo (fully integrated pharmaceutical company) to a FIPNet (fully integrated pharmaceutical network) should allow a given R&D organization to ‘play bigger than its size’ and to more affordably fund the necessary number and quality of pipeline assets.
The pharmaceutical industry is under growing pressure from a range of environmental issues, including major losses of revenue owing to patent expirations, increasingly costconstrained healthcare systems and more demanding regulatory requirements. In our view, the key to tackling the challenges such issues pose to both the future viability of the pharmaceutical industry and advances in healthcare is to substantially increase the number and quality of innovative, costeffective new medicines, without incurring unsustainable R&D costs. However, it is widely acknowledged that trends in industry R&D productivity have been moving in the opposite direction for a number of years.
Here, we present a detailed analysis based on comprehensive, recent, industrywide data to identify the relative contributions of each of the steps in the drug discovery and development process to overall R&D productivity. We then propose specific strategies that could have the most substantial impact in improving R&D productivity.
“Drug Harms in the UK: a Multicriteria Decision Analysis”, Nutt et al 2010
2010nutt.pdf
: “Drug harms in the UK: a multicriteria decision analysis”, (20100101)
“Adversarial Risk Analysis”, Insua et al 2009
2009insua.pdf
: “Adversarial Risk Analysis”, (2009; similar):
Applications in counterterrorism and corporate competition have led to the development of new methods for the analysis of decision making when there are intelligent opponents and uncertain outcomes.
This field represents a combination of statistical risk analysis and game theory, and is sometimes called adversarial risk analysis.
In this article, we describe several formulations of adversarial risk problems, and provide a framework that extends traditional risk analysis tools, such as influence diagrams and probabilistic reasoning, to adversarial problems.
We also discuss the research challenges that arise when dealing with these models, illustrate the ideas with examples from business, and point out relevance to national defense. [keywords: auctions, decision theory, game theory, influence diagrams]
“Retrospectives Guinnessometrics: The Economic Foundation of “Student’s” T”, Ziliak 2008
2008ziliak.pdf
: “Retrospectives Guinnessometrics: The Economic Foundation of “Student’s” t”, (200809; ; backlinks; similar):
In economics and other sciences, “statisticalsignificance” is by custom, habit, and education a necessary and sufficient condition for proving an empirical result (Ziliak and McCloskey, 2008; McCloskey & Ziliak, 1996). The canonical routine is to calculate what’s called a tstatistic and then to compare its estimated value against a theoretically expected value of it, which is found in “Student’s” t table. A result yielding a tvalue greater than or equal to about 2.0 is said to be “statisticallysignificant at the 95% level.” Alternatively, a regression coefficient is said to be “statisticallysignificantly different from the null, p < 0.05.” Canonically speaking, if a coefficient clears the 95% hurdle, it warrants additional scientific attention. If not, not. The first presentation of “Student’s” test of statisticalsignificance came a century ago, in “The Probable Error of a Mean” (1908b), published by an anonymous “Student.” The author’s commercial employer required that his identity be shielded from competitors, but we have known for some decades that the article was written by William Sealy Gosset (1876–1937), whose entire career was spent at Guinness’s brewery in Dublin, where Gosset was a master brewer and experimental scientist (E. S. Pearson, 1937). Perhaps surprisingly, the ingenious “Student” did not give a hoot for a single finding of “statistical”significance, even at the 95% level of statisticalsignificance as established by his own tables. Beginning in 1904, “Student”, who was a businessman besides a scientist, took an economic approach to the logic of uncertainty, arguing finally that statisticalsignificance is “nearly valueless” in itself.
“The Guidelines Manual  Chapter 8: Incorporating Health Economics in Guidelines and Assessing Resource Impact”, NICE 2007
2007niceguidelinesch8.pdf
: “The guidelines manual  Chapter 8: Incorporating health economics in guidelines and assessing resource impact”, NICE (20070413; ; backlinks)
“On the Evolution of Investment Strategies and the Kelly Rule—A Darwinian Approach”, Lensberg & SchenkHoppé 2007
2007lensberg.pdf
: “On the Evolution of Investment Strategies and the Kelly Rule—A Darwinian Approach”, Terje Lensberg, Klaus Reiner SchenkHoppé (20070101; backlinks)
“Information Systems Project Continuation in Escalation Situations: A Real Options Model”, Tiwana et al 2006
2006tiwana.pdf
: “Information Systems Project Continuation in Escalation Situations: A Real Options Model”, (20061009; ; backlinks; similar):
Software project escalation has been shown to be a widespread phenomenon. With few exceptions, prior research has portrayed escalation as an irrational decisionmaking process whereby additional resources are plowed into a failing project.
In this article, we examine the possibility that in some cases managers escalate their commitment not because they are acting irrationally, but rather as a rational response to real options that may be embedded in a project.
A project embeds real options when managers have the opportunity but not the obligation to adjust the future direction of the project in response to external or internal events. Examples include deferring the project, switching the project to serve a different purpose, changing the scale of the project, implementing it in incremental stages, abandoning the project, or using the project as a platform for future growth opportunities. Although real options can represent a substantial portion of a project’s value, they rarely enter a project’s formal justification process in the traditional quantitative discounted cashflowbased project valuation techniques.
Using experimental data collected from managers in 123 firms, we demonstrate that managers recognize and value the presence of real options. We also assess the relative importance that managers ascribe to each type of real option, showing that growth options are more highly valued than operational options. Finally, we demonstrate that the influence of the options on project continuation decisions is largely mediated by the perceived value that they add.
Implications for both theory and practice are discussed.
[Keywords: decision making, escalation, information integration, information systems, innovation management, investment decisions, project continuation, project management, real options]
“Decision by Sampling”, Stewart et al 2006
2006stewart.pdf
: “Decision by sampling”, (20060801; backlinks; similar):
We present a theory of decision by sampling (DbS) in which, in contrast with traditional models, there are no underlying psychoeconomic scales.
Instead, we assume that an attribute’s subjective value is constructed from a series of binary, ordinal comparisons to a sample of attribute values drawn from memory and is its rank within the sample. We assume that the sample reflects both the immediate distribution of attribute values from the current decision’s context and also the background, realworld distribution of attribute values.
DbS accounts for concave utility functions; losses looming larger than gains; hyperbolic temporal discounting; and the overestimation of small probabilities and the underestimation of large probabilities.
[Keywords: judgment, decision making, sampling, memory, utility, gains and losses, temporal discounting, subjective probability]
“The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis”, Smith & Winkler 2006
2006smith.pdf
: “The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis”, (20060301; ; backlinks; similar):
Decision analysis produces measures of value such as expected net present values or expected utilities and ranks alternatives by these value estimates. Other optimizationbased processes operate in a similar manner. With uncertainty and limited resources, an analysis is never perfect, so these value estimates are subject to error. We show that if we take these value estimates at face value and select accordingly, we should expect the value of the chosen alternative to be less than its estimate, even if the value estimates are unbiased. Thus, when comparing actual outcomes to value estimates, we should expect to be disappointed on average, not because of any inherent bias in the estimates themselves, but because of the optimizationbased selection process. We call this phenomenon the optimizer’s curse and argue that it is not well understood or appreciated in the decision analysis and management science communities. This curse may be a factor in creating skepticism in decision makers who review the results of an analysis.
In this paper, we study the optimizer’s curse and show that the resulting expected disappointment may be substantial. We then propose the use of Bayesian methods to adjust value estimates. These Bayesian methods can be viewed as disciplined skepticism and provide a method for avoiding this postdecision disappointment.
“Investing in the Unknown and Unknowable”, Zeckhauser 2006
2006zeckhauser.pdf
: “Investing in the Unknown and Unknowable”, (2006; ; similar):
From David Ricardo making a fortune buying British government bonds on the eve of the Battle of Waterloo to Warren Buffett selling insurance to the California earthquake authority, the wisest investors have earned extraordinary returns by investing in the unknown and the unknowable (UU). But they have done so on a reasoned, sensible basis. This essay explains some of the central principles that such investors employ. It starts by discussing “ignorance”, a widespread situation in the real world of investing, where even the possible states of the world are not known. Traditional finance theory does not apply in UU situations.
Strategic thinking, deducing what other investors might know or not, and assessing whether they might be deterred from investing, for example due to fiduciary requirements, frequently point the way to profitability. Most big investment payouts come when money is combined with complementary skills, such as knowing how to develop real estate or new technologies. Those who lack these skills can look for “sidecar” investments that allow them to put their money alongside that of people they know to be both capable and honest. The reader is asked to consider a number of such investments.
Central concepts in decision analysis, game theory, and behavioral decision are deployed alongside real investment decisions to unearth successful investment strategies. These strategies are distilled into 8 investment maxims. Learning to invest more wisely in a UU world may be the most promising way to substantially bolster your prosperity.
[Keywords: investing, unknown, unknowable, sidecar investment, fattailed distribution, Warren Buffett, Kelly Criterion, asymmetric information]
“The Kelly Criterion in Blackjack Sports Betting, and the Stock Market”, Thorp 2006
2006thorp.pdf
: “The Kelly Criterion in Blackjack Sports Betting, and the Stock Market”, (2006; similar):
[By Edward O. Thorp] The central problem for gamblers is to find positive expectation bets. But the gambler also needs to know how to manage his money, ie. how much to bet. In the stock market (more inclusively, the securities markets) the problem is similar but more complex. The gambler, who is now an “investor”, looks for “excess risk adjusted return”.
In both these settings, we explore the use of the Kelly criterion, which is to maximize the expected value of the logarithm of wealth (“maximize expected logarithmic utility”). The criterion is known to economists and financial theorists by names such as the “geometric mean maximizing portfolio strategy”, maximizing logarithmic utility, the growthoptimal strategy, the capital growth criterion, etc.
The author initiated the practical application of the Kelly criterion by using it for card counting in blackjack. We will present some useful formulas and methods to answer various natural questions about it that arise in blackjack and other gambling games. Then we illustrate its recent use in a successful casino sports betting system. Finally, we discuss its application to the securities markets where it has helped the author to make a 30 year total of 80 billion dollars worth of “bets”.
[Keywords: Kelly criterion, betting, long run investing, portfolio allocation, logarithmic utility, capital growth]
Abstract
Introduction
Coin tossing
Optimal growth: Kelly criterion formulas for practitioners
 The probability of reaching a fixed goal on or before n trials
 The probability of ever being reduced to a fraction x of this initial bankroll
 The probability of being at or above a specified value at the end of a specified number of trials
 Continuous approximation of expected time to reach a goal
 Comparing fixed fraction strategies: the probability that one strategy leads another after n trials
The long run: when will the Kelly strategy “dominate”?
Blackjack
Sports betting
Wall Street: the biggest game
 Continuous approximation
 The (almost) real world
 The case for “fractional Kelly”
 A remarkable formula
A case study
 The constraints
 The analysis and results
 The recommendation and the result
 The theory for a portfolio of securities
My experience with the Kelly approach
Conclusion
Acknowledgments
Appendix A: Integrals for deriving moments of E_{∞}
Appendix B: Derivation of formula (3.1)
Appendix C: Expected time to reach goal
References
“Good and Real: Demystifying Paradoxes from Physics to Ethics”, Drescher 2006
2006dreschergoodandreal.pdf
: “Good and Real: Demystifying Paradoxes from Physics to Ethics”, (2006; backlinks; similar):
In Good and Real, a tourdeforce of metaphysical naturalism, computer scientist Gary Drescher examines a series of provocative paradoxes about consciousness, choice, ethics, quantum mechanics, and other topics, in an effort to reconcile a purely mechanical view of the universe with key aspects of our subjective impressions of our own existence.
Many scientists suspect that the universe can ultimately be described by a simple (perhaps even deterministic) formalism; all that is real unfolds mechanically according to that formalism. But how, then, is it possible for us to be conscious, or to make genuine choices? And how can there be an ethical dimension to such choices? Drescher sketches computational models of consciousness, choice, and subjunctive reasoning—what would happen if this or that were to occur?—to show how such phenomena are compatible with a mechanical, even deterministic universe.
Analyses of Newcomb’s Problem (a paradox about choice) and the Prisoner’s Dilemma (a paradox about selfinterest vs altruism, arguably reducible to Newcomb’s Problem) help bring the problems and proposed solutions into focus. Regarding quantum mechanics, Drescher builds on Everett’s relativestate formulation—but presenting a simplified formalism, accessible to laypersons—to argue that, contrary to some popular impressions, quantum mechanics is compatible with an objective, deterministic physical reality, and that there is no special connection between quantum phenomena and consciousness.
In each of several disparate but intertwined topics ranging from physics to ethics, Drescher argues that a missing technical linchpin can make the quest for objectivity seem impossible, until the elusive technical fix is at hand.:
 Chapter 2 explores how inanimate, mechanical matter could be conscious, just by virtue of being organized to perform the right kind of computation.
 Chapter 3 explains why conscious beings would experience an apparent inexorable forward flow of time, even in a universe who physical principles are timesymmetric and have no such flow, with everything sitting statically in spacetime.
 Chapter 4, following [Hugh] Everett, looks closely at the paradoxes of quantum mechanics, showing how some theorists came to conclude—mistakenly, I argue—that consciousness is part of the story of quantum phenomena, or vice versa. Chapter 4 also shows how quantum phenomena are consistent with determinism (even though socalled hiddenvariable theories of quantum determinism are provably wrong).
 Chapter 5 examines in detail how it can be that we make genuine choices in in a mechanical, deterministic universe.
 Chapter 6 analyzes Newcomb’s Problem, a startling paradox that elicits some counterintuitive conclusions about choice and causality.
 Chapter 7 considers how our choices can have a moral component—that is, how even a mechanical, deterministic universe can provide a basis for distinguishing right from wrong.
 Chapter 8 wraps up the presentation and touches briefly on some concluding metaphysical questions.
“Policy Mining: Learning Decision Policies from Fixed Sets of Data”, Zadrozny 2003
2003zadrozny.pdf
: “Policy Mining: Learning Decision Policies from Fixed Sets of Data”, (2003; ; similar):
In this thesis we present a new data mining methodology for extracting decision policies from datasets containing descriptions of interactions with an environment. This methodology, which we call policy mining, is valuable for applications in which experimental interaction is not feasible but for which fixed sets of collected data are available. Examples of such applications are direct marketing, credit card fraud detection, recommender systems and medical treatment.
Recent advances in classifier learning and the availability of a great variety of offtheshelf learners make it attractive to use classifier learning as the core generalization tool in policy mining. However, in order to successfully apply classifier learning methods to policy mining, 3 important improvements to the current classifier learning technology are necessary.
First, standard classifier learners assume that all incorrect predictions are equally costly. This thesis presents 2 general methods for costsensitive learning that take into account the fact that misclassification costs are different for different examples and unknown for some examples. The methods we propose are evaluated carefully with experiments using large, difficult and highly costsensitive datasets from the direct marketing domain.
Second, most existing learning methods produce classifiers that output ranking scores along with the class label. These scores, however, are classifier dependent and cannot be easily combined with other sources of information for decisionmaking. This thesis presents a fast and effective calibration algorithm for transforming ranking scores into accurate class membership probability estimates. Experimental results using datasets from a variety of domains shows that the method produces probability estimates that are comparable to or better than the ones produced by other methods.
Finally, learning algorithms commonly assume that the available data consists of randomly drawn examples from the same underlying distribution of examples about which the learned model is expected to make predictions. In many situations, however, this assumption is violated because we do not have control over the data gathering process. This thesis formalizes the sample selection bias problem in machine learning and presents methods for learning and evaluation under sample selection bias.
“John W. Tukey: His Life and Professional Contributions”, Brillinger 2002
2002brillinger.pdf
: “John W. Tukey: His Life and Professional Contributions”, David R. Brillinger (20021201)
“Stigler’s Diet Problem Revisited”, Garille & Gass 2001
2001garille.pdf
: “Stigler’s Diet Problem Revisited”, Susan Garner Garille, Saul I. Gass (20010101; ; backlinks)
“Should We Take Measurements at an Intermediate Design Point?”, Gelman 2000
2000gelman.pdf
: “Should we take measurements at an intermediate design point?”, (200003; similar):
It is well known that, for estimating a linear treatment effect with constant variance, the optimal design divides the units equally between the 2 extremes of the design space. If the doseresponse relation may be nonlinear, however, intermediate measurements may be useful in order to estimate the effects of partial treatments.
We consider the decision of whether to gather data at an intermediate design point: do the gains from learning about nonlinearity outweigh the loss in efficiency in estimating the linear effect?
Under reasonable assumptions about nonlinearity, we find that, unless sample size is very large, the design with no interior measurements is best, because with moderate total sample sizes, any nonlinearity in the doseresponse will be difficult to detect.
We discuss in the context of a simplified version of the problem that motivated this work—a study of pestcontrol treatments intended to reduce asthma symptoms in children.
[Keywords: asthma, Bayesian inference, doseresponse experimental design, pest control, statisticalsignificance.]
[See also: the “bet on sparsity principle”.]
“Comparing Classifiers When the Misallocation Costs Are Uncertain”, Adams & Hand 1999
1999adams.pdf
: “Comparing classifiers when the misallocation costs are uncertain”, (19990701; similar):
Receiver Operating Characteristic (ROC) curves are popular ways of summarising the performance of two class classification rules.
In fact, however, they are extremely inconvenient. If the relative severity of the two different kinds of misclassification is known, then an awkward projection operation is required to deduce the overall loss. At the other extreme, when the relative severity is unknown, the area under an ROC curve is often used as an index of performance. However, this essentially assumes that nothing whatsoever is known about the relative severity—a situation which is very rare in real problems.
We present an alternative plot which is more revealing than an ROC plot, and we describe a comparative index which allows one to take advantage of anything that may be known about the relative severity of the two kinds of misclassification.
[Keywords: ROC curve, error rate, loss function, misclassification costs, classification rule, supervised classification]
“Adding Risks: Samuelson's Fallacy of Large Numbers Revisited”, Ross 1999
1999ross.pdf
: “Adding Risks: Samuelson's Fallacy of Large Numbers Revisited”, Stephen A. Ross (19990101; backlinks)
“Information Theory and an Extension of the Maximum Likelihood Principle”, Akaike 1998
1998akaike.pdf
: “Information Theory and an Extension of the Maximum Likelihood Principle”, (1998; similar):
[From Selected Papers of Hirotugu Akaike, pg199–213; Originally published in +Proceeding of the Second International Symposium on Information Theory+, B.N. Petrov and F. Caski, eds., Akademiai Kiado, Budapest, 1973, 267–281]
In this paper it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion. This observation shows an extension of the principle to provide answers to many practical problems of statistical model fitting.
[Keywords: autoregressive model, final prediction error, maximum likelihood principle, statistical model identification, statistical decision function]
“‘Improving Ratings’: Audit in the British University System”, Strathern 1997
1997strathern.pdf
: “‘Improving ratings’: audit in the British University system”, (19970701; similar):
This paper gives an anthropological comment on what has been called the ‘audit explosion’, the proliferation of procedures for evaluating performance. In higher education the subject of audit (in this sense) is not so much the education of the students as the institutional provision for their education. British universities, as institutions, are increasingly subject to national scrutiny for teaching, research and administrative competence. In the wake of this scrutiny comes a new cultural apparatus of expectations and technologies. While the metaphor of financial auditing points to the important values of accountability, audit does more than monitor—it has a life of its own that jeopardizes the life it audits. The runaway character of assessment practices is analysed in terms of cultural practice. Higher education is intimately bound up with the origins of such practices, and is not just the latter day target of them.
…When a measure becomes a target, it ceases to be a good measure. The more a 2.1 examination performance becomes an expectation, the poorer it becomes as a discriminator of individual performances. Hoskin describes this as ‘Goodhart’s law’, after the latter’s observation on instruments for monetary control which lead to other devices for monetary flexibility having to be invented. However, targets that seem measurable become enticing tools for improvement. The linking of improvement to commensurable increase produced practices of wide application. It was that conflation of ‘is’ and ‘ought’, alongside the techniques of quantifiable written assessments, which led in Hoskin’s view to the modernist invention of accountability. This was articulated in Britain for the first time around 1800 as ‘the awful idea of accountability’ (Ref. 3, p. 268)
“The 'awful Idea of Accountability': Inscribing People into the Measurement of Objects”, Hoskin 1996
1996hoskin.pdf
: “The 'awful idea of accountability': inscribing people into the measurement of objects”, Keith Hoskin (19960101; backlinks)
“Seeing The Forest From The Trees: When Predicting The Behavior Or Status Of Groups, Correlate Means”, Lubinski & Humphreys 1996b
1996lubinski2.pdf
: “Seeing The Forest From The Trees: When Predicting The Behavior Or Status Of Groups, Correlate Means”, (1996; ; backlinks; similar):
When measures of individual differences are used to predict group performance, the reporting of correlations computed on samples of individuals invites misinterpretation and dismissal of the data. In contrast, if regression equations, in which the correlations required are computed on bivariate means, as are the distribution statistics, it is difficult to underappreciate or lightly dismiss the utility of psychological predictors.
Given sufficient sample size and linearity of regression, this technique produces crossvalidated regression equations that forecast criterion means with almost perfect accuracy. This level of accuracy is provided by correlations approaching unity between bivariate samples of predictor and criterion means, and this holds true regardless of the magnitude of the “simple” correlation (eg. r_{xy} = 0.20, or r_{xy} = 0.80).
We illustrate this technique empirically using a measure of general intelligence as the predictor and other measures of individual differences and socioeconomic status as criteria. In addition to theoretical applications pertaining to group trends, this methodology also has implications for applied problems aimed at developing policy in numerous fields.
…To summarize, psychological variables generating modest correlations frequently are discounted by those who focus on the magnitude of unaccounted for criterion variance, large standard errors, and frequent false positive and false negative errors in predicting individuals. Dismissal of modest correlations (and the utility of their regressions) by professionals based on this psychometricstatistical reasoning has spread to administrators, journalists, and legislative policy makers. Some examples of this have been compiled by Dawes (1979, 1988) and Linn (1982). They range from squaring a correlation of 0.345 (ie. 0.12) and concluding that for 88% of students, “An SAT score will predict their grade rank no more accurately than a pair of dice” (cf. Linn, 1982, p. 280) to evaluating the differential utility of two correlations 0.20 and 0.40 (based on different procedures for selecting graduate students) as “twice of nothing is nothing” (cf. Dawes, 1979, p. 580).
…Tests are used, however, in ways other than the prediction of individuals or of a specific outcome for Johnny or Jane. And policy decisions based on tests frequently have broader implications for individuals beyond those directly involved in the assessment and selection context (see the discussion later in this article). For example, selection of personnel in education, business, industry, and the military focuses on the criterion performance of groups of applicants whose scores on selection instruments differ. Selection psychologists have long made use of modest predictive correlations when the ratio of applicants to openings becomes large. The relation of utility to size of correlation, relative to the selection ratio and base rate for success (if one ignores the test scores), is incorporated in the wellknown TaylorRussell (1939) tables. These tables are examples of how psychological tests have revealed convincingly economic and societal benefits (Hartigan & Wigdor 1989), even when a correlation of modest size remains at center stage. For example, given a base rate of 30% for adequate performance and a predictive validity coefficient of 0.30 within the applicant population, selecting the top 20% on the predictor test will result in 46% of hires ultimately achieving adequate performance (a 16% gain over base rate). To be sure, the prediction for individuals within any group is not strong—about 9% of the variance in job performance. Yet, when training is expensive or timeconsuming, this can result in huge savings. For analyses of groups composed of anonymous persons, however, there is a more unequivocal way of illustrating the importance of modest correlations than even the TaylorRussell tables provide.
Rationale for an Alternative Approach: Applied psychologists discovered decades ago that it is more advantageous to report correlations between a continuous predictor and a dichotomous criterion graphically rather than as a number that varies between zero and one. For example, the correlation (point biserial) of about 0.40 with the passfail pilot training criterion and an abilitystanine predictor looks quite impressive when graphed in the manner of Figure 1a. In contrast, in Figure 1b, a scatter plot of a correlation of 0.40 between two continuous measures looks at first glance like the pattern of birdshot on a target. It takes close scrutiny to perceive that the pattern in Figure 1b is not quite circular for the small correlation. Figure 1a communicates the information more effectively than Figure 1b. When the data on the predictive validity of the pilot abilitystanine were presented in the form of Figure 1a (rather than, say, as a scatter plot of a correlation of 0.40; Figure 1b), general officers in recruitment, training, logistics, and operations immediately grasped the importance of the data for their problems. Because the Army Air Forces were an attractive career choice, there were many more applicants for pilot training than could be accommodated and selection was required…A small gain on a criterion for an unit of gain on the predictor, as long as it is predicted with nearperfect accuracy, can have high utility.
“Processing Linguistic Probabilities: General Principles and Empirical Evidence”, Budescu & Wallsten 1995
1995budescu.pdf
: “Processing Linguistic Probabilities: General Principles and Empirical Evidence”, (1995; backlinks; similar):
This chapter discusses that practical issues arise because weighty decisions often depend on forecasts and opinions communicated from one person or set of individuals to another.
The standard wisdom has been that numerical communication is better than linguistic, and therefore, especially in important contexts, it is to be preferred. A good deal of evidence suggests that this advice is not uniformly correct and is inconsistent with strongly held preferences. A theoretical understanding of the preceding questions is an important step toward the development of means for improving communication, judgment, and decision making under uncertainty. The theoretical issues concern how individuals interpret imprecise linguistic terms, what factors affect their interpretations, and how they combine those terms with other information for the purpose of taking action. The chapter reviews the relevant literature in order to develop a theory of how linguistic information about imprecise continuous quantities is processed in the service of decision making, judgment, and communication.
It provides the current view, which has evolved inductively, to substantiate it where the data allow, and to suggest where additional research is needed. It also summarizes the research on meanings of qualitative probability expressions and compares judgments and decisions made on the basis of vague and precise probabilities.
“Computer Based Horse Race Handicapping and Wagering Systems: A Report”, Hausch et al 1994
1994benter.pdf
: “Computer Based Horse Race Handicapping and Wagering Systems: A Report”, Donald B. Hausch, Victor SY Lo, William T. Ziemba (19940101)
“Bayesian Updating in Hierarchic Markov Processes Applied to the Animal Replacement Problem”, Kristensen 1993
1993kristensen.pdf
: “Bayesian updating in hierarchic Markov processes applied to the animal replacement problem”, (19930601; similar):
The observed level of milk yield of a dairy cow or the litter size of a sow is only partially the result of a permanent characteristic of the animal; temporary effects are also involved. Thus, we face a problem concerning the proper definition and measurement of the traits in order to give the best possible prediction of the future revenues from an animal considered for replacement. A trait model describing the underlying effects is built into a model combining a Bayesian approach with a hierarchic Markov process in order to be able to calculate optimal replacement policies under various conditions.
“Learning from Coarse Information: Biased Contests and Career Profiles”, Meyer 1991
1991meyer.pdf
: “Learning from Coarse Information: Biased Contests and Career Profiles”, (1991; similar):
An organization’s promotion decision between 2 workers is modelled as a problem of boundedlyrational learning about ability. The decisionmaker can bias noisy rankorder contests sequentially, thereby changing the information they convey.
The optimal finalperiod bias favours the “leader”, reinforcing his likely ability advantage. When optimally biased rankorder information is a sufficient statistic for cardinal information, the leader is favoured in every period. In other environments, bias in early periods may (1) favour the early loser, (2) be optimal even when the workers are equally rated, and (3) reduce the favoured worker’s promotion chances.
“Weight or the Value of Knowledge”, Ramsey 1990
1990ramsey.pdf
: “Weight or the Value of Knowledge”, Frank P. Ramsey (19900101; ; backlinks)
“'Student': A Statistical Biography of William Sealy Gosset”, Pearson et al 1990
1990pearsonstudentastatisticalbiographyofwilliamsealygosset.pdf
: “'Student': A Statistical Biography of William Sealy Gosset”, Egon S. Pearson, R. L. Plackett, G. A. Barnard (19900101)
“F. P. Ramsey: Philosophical Papers”, Ramsey & Mellor 1990
1990mellorfrankramseyphilosophicalpapers.pdf
: “F. P. Ramsey: Philosophical Papers”, F. P. Ramsey, D. H. Mellor (19900101; ; backlinks)
“The Total Evidence Theorem for Probability Kinematics”, Graves 1989
1989graves.pdf
: “The Total Evidence Theorem for Probability Kinematics”, (198906; similar):
L. J. Savage and I. J. Good have each demonstrated that the expected utility of free information [Value of Information] is never negative for a decision maker who updates her degrees of belief by conditionalization on propositions learned for certain. In this paper Good’s argument is generalized to show the same result for a decision maker who updates her degrees of belief on the basis of uncertain information by Richard Jeffrey’s probability kinematics. The Savage/Good result is shown to be a special case of the more general result.
“Nonlinear Preference and Utility Theory”, Fishburn 1988
1988fishburnnonlinearpreferencesandutilitytheory.pdf
: “Nonlinear Preference and Utility Theory”, Peter C. Fishburn (19880101)
“Measuring the Vague Meanings of Probability Terms”, Wallsten et al 1986
1986wallsten.pdf
: “Measuring the vague meanings of probability terms”, (19861201; backlinks; similar):
Can the vague meanings of probability terms such as doubtful, probable, or likely be expressed as membership functions over the [0, 1] probability interval? A function for a given term would assign a membership value of 0 to probabilities not at all in the vague concept represented by the term, a membership value of 1 to probabilities definitely in the concept, and intermediate membership values to probabilities represented by the term to some degree.
A modified paircomparison procedure was used in 2 experiments to empirically establish and assess membership functions for several probability terms. Subjects performed 2 tasks in both experiments: They judged (1) to what degree one probability rather than another was better described by a given probability term, and (2) to what degree one term rather than another better described a specified probability. Probabilities were displayed as relative areas on spinners.
Task 1 data were analyzed from the perspective of conjointmeasurement theory, and membership function values were obtained for each term according to various scaling models. The conjointmeasurement axioms were well satisfied and goodnessoffit measures for the scaling procedures were high. Individual differences were large but stable. Furthermore, the derived membership function values satisfactorily predicted the judgments independently obtained in task 2.
The results support the claim that the scaled values represented the vague meanings of the terms to the individual subjects in the present experimental context. Methodological implications are discussed, as are substantive issues raised by the data regarding the vague meanings of probability terms.
Assessed membership functions over the [0,1] probability interval for several vague meanings of probability terms (eg. doubtful, probable, likely), using a modified paircomparison procedure in 2 experiments with 20 and 8 graduate business students, respectively. Subjects performed 2 tasks in both experiments: They judged (A) to what degree one probability rather than another was better described by a given probability term and (B) to what degree one term rather than another better described a specified probability. Probabilities were displayed as relative areas on spinners. Task A data were analyzed from the perspective of conjointmeasurement theory, and membership function values were obtained for each term according to various scaling models. Findings show that the conjointmeasurement axioms were well satisfied and goodnessoffit measures for the scaling procedures were high. Individual differences were large but stable, and the derived membership function values satisfactorily predicted the judgments independently obtained in Task B. Results indicated that the scaled values represented the vague meanings of the terms to the individual Ss in the present experimental context.
“An Examination of Two Alternative Techniques to Estimate the Standard Deviation of Job Performance in Dollars”, Reilly & Smither 1985
1985reilly.pdf
: “An examination of two alternative techniques to estimate the standard deviation of job performance in dollars”, (19851101; ; similar):
Two methods for estimating dollar standard deviations were investigated in a simulated environment. 19 graduate students with management experience managed a simulated pharmaceutical firm for 4 quarters. Ss were given information describing the performance of sales representatives on 3 job components. Estimates derived using the method developed by F. L. Schmidt et al 1979 (see record 1981–02231–001) were relatively accurate with objective sales data that could be directly translated to dollars, but resulted in overestimates of means and standard deviations when data were less directly translatable to dollars and involved variable costs. An additional problem with the Schmidt et al procedure involved the presence of outliers, possibly caused by differing interpretations of instructions. The CascioRamos estimate of performance in dollars (CREPID) technique, proposed by W. F. Cascio (1982), yielded smaller dollar standard deviations, but Ss could reliably discriminate among job components in terms of importance and could accurately evaluate employee performance on those components. Problems with the CREPID method included the underlying scale used to obtain performance ratings and a dependency on job component intercorrelations.
“Game Theoretic Analysis of a Bankruptcy Problem from the Talmud”, Aumann & Maschler 1985
1985aumann.pdf
: “Game theoretic analysis of a bankruptcy problem from the Talmud”, Robert J. Aumann, Michael Maschler (19850101)
“Influence Diagrams”, Howard & Matheson 1984
2005howard.pdf
: “Influence Diagrams”, Ronald A. Howard, James E. Matheson (19840101)
“The Citation Bias: Fad and Fashion in the Judgment and Decision Literature”, ChristensenSzalanski & Beach 1984
1984christensenszalanski.pdf
: “The Citation Bias: Fad and Fashion in the Judgment and Decision Literature”, (1984; ; similar):
Examined whether selectivity was used in the citing of evidence in research on the psychology of judgment and decision making and investigated the possible effects that this citation bias might have on the views of readers of the literature.
An analysis of the frequency of citations of goodperformance and poorperformance articles cited in the Social Sciences Citation Index 1972–1981 revealed that poorperformance articles were cited statisticallysignificantly more often than goodperformance articles.
80 members of the Judgment and Decision Making Society, a semiformal professional group, were asked to complete a questionnaire assessing the overall quality of human judgment and decisionmaking abilities on a scale from 0 to 100 and to list 4 examples of documented poor judgment or decisionmaking performance and 4 examples of good performance. Subjects recalled statisticallysignificantly more examples of poor than of good performance. Less experienced Subjects in the field appeared to have a lower opinion of human reasoning ability than did highly experienced Subjects. Also, Subjects recalled 50% more examples of poor performance than of good performance, despite the fact that the variety of poorperformance examples was limited.
It is concluded that there is a citation bias in the judgment and decisionmaking literature, and poorperformance articles are receiving most of the attention from other writers, despite equivalent proportions of each type in the journals.
“Readings on the Principles and Applications of Decision Analysis: Volume 2: Professional Collection”, Howard & Matheson 1983
1983howardreadingsondecisionanalysisv2.pdf
: “Readings on the Principles and Applications of Decision Analysis: Volume 2: Professional Collection”, Ronald H. Howard, James E. Matheson (19830101)
“Readings on the Principles and Applications of Decision Analysis: Volume 1: General Collection”, Howard & Matheson 1983
1983howardreadingsondecisionanalysisv1.pdf
: “Readings on the Principles and Applications of Decision Analysis: Volume 1: General Collection”, Ronald H. Howard, James E. Matheson (19830101)
“MultiBayesian Statistical Decision Theory”, Weerahandi & Zidek 1981
1981weerahandi.pdf
: “MultiBayesian Statistical Decision Theory”, S. Weerahandi, J. V. Zidek (19810101)
“Impact of Valid Selection Procedures on Workforce Productivity”, Schmidt et al 1979
1979schmidt.pdf
: “Impact of valid selection procedures on workforce productivity”, (1979; ; backlinks; similar):
Used decision theoretic equations to estimate the impact of the Programmer Aptitude Test (PAT) on productivity if used to select new computer programmers for 1 yr in the federal government and the national economy. A newly developed technique was used to estimate the standard deviation of the dollar value of employee job performance, which in the past has been the most difficult and expensive item of required information. For the federal government and the US economy separately, results are presented for different selection ratios and for different assumed values for the validity of previously used selection procedures. The impact of the PAT on programmer productivity was substantial for all combinations of assumptions. Results support the conclusion that hundreds of millions of dollars in increased productivity could be realized by increasing the validity of selection decisions in this occupation. Similarities between computer programmers and other occupations are discussed. It is concluded that the impact of valid selection procedures on workforce productivity is considerably greater than most personnel psychologists have believed.
“Science and Statistics”, Box 1976
1976box.pdf
: “Science and Statistics”, (19761201; similar):
Aspects of scientific method are discussed: In particular, its representation as a motivated iteration in which, in succession, practice confronts theory, and theory, practice. Rapid progress requires sufficient flexibility to profit from such confrontations, and the ability to devise parsimonious but effective models, to worry selectively about model inadequacies and to employ mathematics skillfully but appropriately. The development of statistical methods at Rothamsted Experimental Station by Sir Ronald Fisher is used to illustrate these themes.
…Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad… In applying mathematics to subjects such as physics or statistics we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless. The physicist knows that particles have mass and yet certain results, approximating what really happens, may be derived from the assumption that they do not. Equally, the statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to an useful approximation, those found in the real world.
It follows that, although rigorous derivation of logical consequences is of great importance to statistics, such derivations are necessarily encapsulated in the knowledge that premise, and hence consequence, do not describe natural truth. It follows that we cannot know that any statistical technique we develop is useful unless we use it. Major advances in science and in the science of statistics in particular, usually occur, therefore, as the result of the theorypractice iteration.
“When Values Conflict: Essays on Environmental Analysis, Discourse, and Decision”, Tribe et al 1976
1976tribewhenvaluesconflict.pdf
: “When Values Conflict: Essays on Environmental Analysis, Discourse, and Decision”, (1976; similar):
When Values Conflict: Essays on Environmental Analysis, Discourse, and Decision is a collection of essays each of which addresses the issue of value conflicts in environmental disputes. These authors discuss the need to integrate such “fragile” values as beauty and naturalness with “hard” values such as economic efficiency in the decision making process. When Values Conflict: Essays on Environmental Analysis, Discourse, and Decision will be of interest to those who seek to include environmentalist values in public policy debates. This work is comprised of seven essays.
 In the first chapter, Robert Socolow discusses obstacles to the integration of environmental values into natural resource policy. Technical studies often fail to resolve conflicts, because such conflict rest of the parties’ very different goals and values. Nonetheless, agreement on the technical analysis may serve as a platform from which to more clearly articulate value differences.
 Irene Thomson draws on the case of the Tocks Island Dam controversy to explore environmental decision making processes. She describes the impact the various party’s interests and values have on their analyses, and argues that the fragmentation of responsibility among institutional actors contributes to the production of inadequate analyses.
 Tribe’s essay suggests that a natural environment has intrinsic value, a value that cannot be reduced to human interests. This recognition may serve as the first step in developing an environmental ethic.
 Charles Frankel explores the idea that nature has rights. He first explores the meaning of nature, by contrast to the supernatural, technological and cultural. He suggests that appeals to nature’s rights serves as an appeal for “institutional protection against being carried away by temporary enthusiasms.”
 In Chapter Five, Harvey Brooks describes three main functions which analysis serves in the environmental decisionmaking process: they ground conclusions in neutral, generally accepted principles, they separate means from ends, and they legitimate the final policy decision. If environmental values such as beauty, naturalness and uniqueness are to be incorporated into systems analysis, they must do so in such a way as to preserve the basic function of analysis.
 Henry Rowen discusses the use of policy analysis as an aid to making environmental decisions. He describes the characteristics of a good analysis, and argues that good analysis can help clarify the issues, and assist in “the design and invention of objectives and alternatives.” Rowen concludes by suggesting ways of improving the field of policy analysis.
 Robert Dorfman provides the Afterword for this collection. This essay distinguishes between value and price, and explores the import of this distinction for costbenefit analysis. The author concludes that there can be no “formula for measuring a projects contribution to humane values.” Environmental decisions will always require the use of human judgement and wisdom.
When Values Conflict: Essays on Environmental Analysis, Discourse, and Decision offers a series of thoughtful essays on the nature and weight of environmentalist values. The essays range from a philosophic investigation of natural value to a more concrete evaluation of the elements of good policy analysis.
“Boundaries of Analysis: An Inquiry into the Tocks Island Dam Controversy”, Feiveson et al 1976
1976feivesonboundariesofanalysis.pdf
: “Boundaries of Analysis: An Inquiry into the Tocks Island Dam Controversy”, (1976; similar):
This is a study of what happens to technical analyses in the real world of politics. The Tocks Island Dam project proposed construction of a dam on the Delaware River at Tocks Island, five miles north of the Delaware Water Gap. Planned and developed in the early 1960’s, it was initially considered a model of water resource planning. But it soon became the target of an extended controversy involving a tangle of interconnected concerns—floods and droughts, energy, growth, congestion, recreation, and the uprooting of people and communities. Numerous participants—economists, scientists, planners, technologists, bureaucrats and environmentalists—measured, modeled and studied the Tocks Island proposal. The results were a weighty legacy of technical and economic analyses—and a decade of political stalemate regarding the fate of the dam.
These analyses, to a substantial degree, masked the value conflicts at stake in the controversy; they concealed the real political and human issues of who would win and who would lose if the Tocks Island project were undertaken. And, the studies were infected by rigid categories of thought and divisions of bureaucratic responsibilities.
This collection of original essays tells the story of the Tocks Island controversy, with a fresh perspective on the environmental issues at stake. Its contributors consider the political decisionmaking process throughout the controversy and show how economic and technological analyses affected those decisions. Viewed as a whole, the essays show that systematic analysis and an explicit concern for human values need not be mutually exclusive pursuits.
“Portfolio Choice and the Kelly Criterion”, Thorp 1975
1975thorp.pdf
: “Portfolio Choice and the Kelly Criterion”, (1975; similar):
This chapter focuses on Kelly’s capital growth criterion for longterm portfolio growth.
The Kelly (BernoulliLatané or capital growth) criterion is to maximize the expected value E log X of the logarithm of the random variable X, representing wealth. The chapter presents a treatment of the Kelly criterion and Breiman’s results.
Breiman’s results can be extended to cover many if not most of the more complicated situations which arise in realworld portfolios Specifically, the number and distribution of investments can vary with the time period, the random variables need not be finite or even discrete, and a certain amount of dependence can be introduced between the investment universes for different time periods. The chapter also discusses a few relationships between the max expected log approach and Markowitz’s meanvariance approach.
It highlights a few misconceptions concerning the Kelly criterion, the most notable being the fact that decisions that maximize the expected log of wealth do not necessarily maximize expected utility of terminal wealth for arbitrarily large time horizons.
“CrossModality Matching of Money Against Other Continua”, Galanter & Pliner 1974
1974galanter.pdf
: “CrossModality Matching of Money Against Other Continua”, (1974; similar):
Crossmodality matching of hypothetical increments of money against loudness recover the previously proposed exponent of the utility function for money within a few percent. Similar crossmodality matching experiments for decrements give a disutility exponent of 0.59, larger than the utility exponent for increments. This disutility exponent was checked by an additional crossmodality matching experiment against the disutility of drinking various concentrations of a bitter solution. The parameter estimated in this fashion was 0.63.
Three experiments were conducted in which monetary increments and decrements were matched to either the loudness of a tone or the bitterness of various concentrations of sucrose octaacetate. An additional experiment involving ratio estimates of monetary loss is also reported. Results confirm that the utility function for both monetary increments and decrements is a power function with exponents less than one. The data further suggest that the exponent of the disutility function is larger than that of the utility function, i.e., the rate of change of ‘unhappiness’ caused by monetary losses is greater than the comparable rate of ‘happiness’ produced by monetary gains.
“The General Impossibility of Normative Accounting Standards”, Demski 1973
1973demski.pdf
: “The General Impossibility of Normative Accounting Standards”, Joel S. Demski (19731001; backlinks)
“The Theory of Social Choice”, Fishburn 1973
1973fishburntheoryofsocialchoice.pdf
: “The Theory of Social Choice”, Peter C. Fishburn (19730101)
“What Makes for a Beautiful Problem in Science?”, Samuelson 1970
1970samuelson.pdf
: “What Makes for a Beautiful Problem in Science?”, Paul A. Samuelson (19700101; ; backlinks)
“General Proof That Diversification Pays”, Samuelson 1967
1967samuelson.pdf
: “General Proof that Diversification Pays”, Paul Samuelson (19670101; backlinks)
“Optimal Dairy Cow Replacement Policies”, Giaever 1966
1966giaever.pdf
: “Optimal Dairy Cow Replacement Policies”, Harald Birger Giaever (19660101)
“Measuring Utility by a Singleresponse Sequential Method”, Becker et al 1964
1964becker.pdf
: “Measuring utility by a singleresponse sequential method”, (1964; ; backlinks; similar):
[BeckerDeGrootMarschak mechanism] A person deciding on a career, a wife, or a place to live bases his choice on 2 factors: (1) How much do I like each of the available alternatives? and (2) What are the chances for a successful outcome of each alternative? These 2 factors comprise the utility of each outcome for the person making the choice. This notion of utility is fundamental to most current theories of decision behavior.
According to the expected utility hypothesis, if we could know the utility function of a person, we could predict his choice from among any set of actions or objects. But the utility function of a given subject is almost impossible to measure directly.
To circumvent this difficulty, stochastic models of choice behavior have been formulated which do not predict the subject’s choices but make statements about the probabilities that the subject will choose a given action. This paper reports an experiment to measure utility and to test one stochastic model of choice behavior.
“A Model for Selecting One of Two Medical Treatments”, Colton 1963
1963colton.pdf
: “A Model for Selecting One of Two Medical Treatments”, (1963; similar):
A simple cost function approach is proposed for designing an optimal clinical trial when a total of n patients with a disease are to be treated with one of two medical treatments.
The cost function is constructed with but one cost, the consequences of treating a patient with the superior or inferior of the two treatments. Fixed sample size and sequential trials are considered. Minimax, maximin, and Bayesian approaches are used for determining the optimal size of a fixed sample trial and the optimal position of the boundaries of a sequential trial.
Comparisons of the different approaches are made as well as comparisons of the results for the fixed and sequential plans.
“Studies of War, Nuclear and Conventional”, Blackett 1962
1962blackettstudiesofwarnuclearandconventional.pdf
: “Studies of War, Nuclear and Conventional”, Patrick Maynard Stuart Blackett (19620101)
“Applied Statistical Decision Theory”, Raiffa & Schlaifer 1961
1961raiffaappliedstatisticaldecisiontheory.pdf
: “Applied Statistical Decision Theory”, Howard Raiffa, Robert Schlaifer (19610101; backlinks)
“Gradient Theory of Optimal Flight Paths”, Kelley 1960
1960kelley.pdf
: “Gradient Theory of Optimal Flight Paths”, (19601001; ; backlinks; similar):
An analytical development of flight performance optimization according to the method of gradients or ‘method of steepest decent’ is presented. Construction of a minimizing sequence of flight paths by a stepwise process of descent along the local gradient direction is described as a computational scheme. Numerical application of the technique is illustrated in a simple example of orbital transfer via solar sail propulsion. Successive approximations to minimum time planar flight paths from Earth’s orbit to the orbit of Mars are presented for cases corresponding to free and fixed boundary conditions on terminal velocity components.
“Testing Statistical Hypotheses (First Edition)”, Lehmann 1959
1959lehmanntestingstatisticalhypotheses.pdf
: “Testing Statistical Hypotheses (First Edition)”, E. L. Lehmann (19590101; ; backlinks)
“Probability and Statistics for Business Decisions: An Introduction to Managerial Economics Under Uncertainty”, Schlaifer 1959
1959schlaiferprobabilitystatisticsbusinessdecisions.pdf
: “Probability and Statistics for Business Decisions: An Introduction to Managerial Economics Under Uncertainty”, (1959; backlinks; similar):
This book is a nonmathematical introduction to the logical analysis of practical business problems in which a decision must be reached under uncertainty. The analysis which it recommends is based on the modern theory of utility and what has come to be known as the “‘personal”’ definition of probability; the author believes, in other words, that when the consequences of various possible courses of action depend on some unpredictable event, the practical way of choosing the”best” act is to assign values to consequences and probabilities to events and then to select the act with the highest expected value. In the author’s experience, thoughtful businessmen intuitively apply exactly this kind of analysis in problems which are simple enough to allow of purely intuitive analysis; and he believes that they will readily accept its formalization once the essential logic of this formalization is presented in a way which can be comprehended by an intelligent layman. Excellent books on the pure mathematical theory of decision under uncertainty already exist; the present text is an endeavor to show how formal analysis of practical decision problems can be made to pay its way.
From the point of view taken in this book, there is no real difference between a “statistical” decision problem in which a part of the available evidence happens to come from a ‘sample’ and a problem in which all the evidence is of a less formal nature. Both kinds of problems are analyzed by use of the same basic principles; and one of the resulting advantages is that it becomes possible to avoid having to assert that nothing useful can be said about a sample which contains an unknown amount of bias while at the same time having to admit that in most practical situations it is totally impossible to draw a sample which does not contain an unknown amount of bias. In the same way and for the same reason there is no real difference between a decision problem in which the longrunaverage demand for some commodity is known with certainty and one in which it is not; and not the least of the advantages which result from recognizing this fact is that it becomes possible to analyze a problem of inventory control without having to pretend that a finite amount of experience can ever give anyone perfect knowledge of longrunaverage demand. The author is quite ready to admit that in some situations it may be difficult for the businessman to assess the numerical probabilities and utilities which are required for the kind of analysis recommended in this book, but he is confident that the businessman who really tries to make a reasoned analysis of a difficult decision problem will find it far easier to do this than to make a direct determination of, say, the correct risk premium to add to the pure cost of capital or of the correct level at which to conduct a test of statisticalsignificance.
In sum, the author believes that the modern theories of utility and personal probability have at last made it possible to develop a really complete theory to guide the making of managerial decisions—a theory into which the traditional disciplines of statistics and economics under certainty and the collection of miscellaneous techniques taught under the name of operations research will all enter as constituent parts. He hopes, therefore, that the present book will be of interest and value not only to students and practitioners of inventory control, quality control, marketing research, and other specific business functions but also to students of business and businessmen who are interested in the basic principles of managerial economics and to students of economics who are interested in the theory of the firm. Even the teacher of a course in mathematical decision theory who wishes to include applications as well as completeclass and existence theory may find the book useful as a source of examples of the practical decision problems which do arise in the real world.
“An Optimum Character Recognition System Using Decision Functions”, Chow 1957
1957chow.pdf
: “An Optimum Character Recognition System Using Decision Functions”, (19571201; ; backlinks; similar):
The character recognition problem, usually resulting from characters being corrupted by printing deterioration and/or inherent noise of the devices, is considered from the viewpoint of statistical decision theory.
The optimization consists of minimizing the expected risk for a weight function which is preassigned to measure the consequences of system decisions As an alternative minimization of the error rate for a given rejection rate is used as the criterion. The optimum recognition is thus obtained.
The optimum system consists of a conditionalprobability densities computer; character channels, one for each character; a rejection channel; and a comparison network. Its precise structure and ultimate performance depend essentially upon the signals and noise structure.
Explicit examples for an additive Gaussian noise and a “cosine” noise are presented. Finally, an errorfree recognition system and a possible criterion to measure the character style and deterioration are presented.
“Unsolved Problems of Experimental Statistics”, Tukey 1954
1954tukey.pdf
: “Unsolved Problems of Experimental Statistics”, John W. Tukey (19540101; ; backlinks)
“NonCooperative Games”, Nash 1951
1951nash.pdf
: “NonCooperative Games”, (19510901; ; backlinks; similar):
…Our game theory, in contradistinction, is based on the absence of coalitions in that it is assumed that each participant acts independently, without collaboration or communication with any of the others.
The notion of an equilibrium point is the basic ingredient in our theory. This notion yields a generalization of the concept of the solution of a 2person zerosum game. It turns out that the set of equilibrium points of a 2person zerosum game is simply the set of all pairs of opposing ‘good strategies’.
In the immediately following sections we shall define equilibrium points and prove that a finite noncooperative game always has at least one equilibrium point. We shall also introduce the notions of solvability and strong solvability of a noncooperative game and prove a theorem on the geometrical structure of the set of equilibrium points of a solvable game.
As an example of the application of our theory we include a solution of a simplified 3 person poker game.
“The Economic Life of Industrial Equipment”, Preinreich 1940
1940preinreich.pdf
: “The Economic Life of Industrial Equipment”, Gabriel A. D. Preinreich (19400101)
“"Student" As Statistician”, Pearson 1939
1939pearson.pdf
: “"Student" as Statistician”, (19390100; ; backlinks; similar):
[Egon Pearson describes Student, or Gosset, as a statistician: Student corresponded widely with young statisticians/mathematicians, encouraging them, and having an outsized influence not reflected in his publication. Student’s preferred statistical tools were remarkably simple, focused on correlations and standard deviations, but wielded effectively in the analysis and efficient design of experiments (particularly agricultural experiments), and he was an early decisiontheorist, focused on practical problems connected to his Guinness Brewery job—which detachment from academia partially explains why he didn’t publish methods or results immediately or often. The need to handle small n of the brewery led to his work on smallsample approximations rather than, like Pearson et al in the Galton biometric tradition, relying on collecting large datasets and using asymptotic methods, and Student carried out one of the first Monte Carlo simulations.]
“Presidential Address to the First Indian Statistical Congress”, Fisher 1938
1938fisher.pdf
: “Presidential address to the first Indian statistical congress”, R. A. Fisher (19380101; backlinks)
“The Lanarkshire Milk Experiment”, Elderton 1933
1933elderton.pdf
: “The Lanarkshire Milk Experiment”, Ethel M. Elderton (19330101; backlinks)
“Pasteurised and Raw Milk”, Fisher & Bartlett 1931
1931fisher.pdf
: “Pasteurised and Raw Milk”, R. A. Fisher, S. Bartlett (19310101; backlinks)
“On Testing Varieties of Cereals”, Gosset 1923
1923student.pdf
: “On Testing Varieties of Cereals”, William Sealy Gosset (19230101; ; backlinks)
ThueMorse sequence
Thompson sampling
Optimal stopping
Multiobjective optimization
Monte Carlo tree search
Miscellaneous

2018cohen.pdf
(20180101) 
2017chupeau.pdf
(20170101; ; backlinks) 
2011ioannidis.pdf
(20110101; ; backlinks) 
1997mcclellandoptimalexperimentdesign.pdf
(19970101; backlinks) 
1939taylor.pdf
(19390101; ; backlinks) 
https://www.sumsar.net/blog/2015/01/probablepointsandcredibleintervalsparttwo/
( ) 
https://www.lesswrong.com/posts/rEZqP7K4MG6waC2zf/optimizingcropplantingwithmixedintegerlinear

https://www.chrisstucchio.com/blog/2014/equal_weights.html
( ; backlinks) 
https://proceedings.neurips.cc/paper/2010/file/edfbe1afcf9246bb0d40eb4d8027d90fPaper.pdf
( ) 
https://hope.econ.duke.edu/sites/hope.econ.duke.edu/files/Banzhaf.pdf
( ) 
https://constructionphysics.substack.com/p/thescienceofproduction
( ) 
https://80000hours.org/podcast/episodes/brianchristianalgorithmstoliveby/
( ) 
Localoptima
( ; backlinks) 
1995tengs.pdf
( ; backlinks) 
1995prattintroductionstatisticaldecisiontheory.epub
(backlinks) 
1986lehmanntestingstatisticalhypotheses.pdf
( ; backlinks) 
1984frey.pdf
( ; backlinks) 
1981frey.pdf
( ; backlinks) 
1968cohen.pdf
( ; backlinks) 
1965black.pdf
( ; backlinks) 
1957savage.pdf
( ; backlinks) 
1954hodges.pdf
( ; backlinks) 
MuggingDP
( ; backlinks; similar) 
MCTSAI
( ) 
Embryoediting
( ; backlinks; similar)