thompson5

402 Citations Questioning the Indiscriminate Use of
Null Hypothesis Significance Tests in Observational Studies

(Compiled by Bill Thompson: thompson@uark.edu)
(updated 2/26/01)

In 1997, I compiled a list of articles, books, and book chapters that questioned the widespread use of null hypothesis significance tests (a.k.a. null hypothesis tests, significance tests) in scientific research. My goal was to provide those unfamiliar with this debate with a list of citations that pointed out the myriad of problems associated with the indiscriminate use of null hypothesis tests. For parity, I also compiled a list of references that supported, at least to a limited extent, the use of null hypothesis tests.

Ironically, null hypothesis testing as it is currently practiced is a hybridization of R. A. Fisher’s significance test and J. Neyman and E. Pearson’s null hypothesis test (hence the label “null hypothesis significance test”). These two approaches were fundamentally different and were the source of heated debate between these two camps for many years (see Goodman 1993a for an excellent review of this historical debate). I sincerely doubt that the melding of these two approaches would have been acceptable to either Fisher or Neyman and Pearson.

Both the original list of 326 citations and the current one of 402 citations are strong evidence that null hypothesis testing has been, and continues to be, at the forefront of debate within a number of disciplines, especially the Social Sciences (see special features with pro/con articles on this topic in Morrison and Henkel 1970; Journal of Experimental Education 1993 [volume 61, no. 4]; Psychological Science 1997 [volume 8, no. 1]; Harlow et al. 1997; Behavioral and Brain Sciences 1998 [volume 21, no. 2]; and Research in the Schools [volume 5, no. 2]). Although in 1997 I noted a general lack of awareness of this debate within my own discipline of wildlife biology/ecology, there have recently been some rumblings here as well (e.g., see Cherry 1998, Johnson 1999, and Anderson et al. 2000). I was fortunate to be involved with co-chairing, with Dr. Chris Ribic, a symposium on the use/misuse of null hypothesis testing in wildlife science during the Fifth Annual Conference of The Wildlife Society in Buffalo, NY on 26 September, 1998. This brought this important topic to the attention of many wildlife biologists for the first time, and ultimately lead to Dr. Doug Johnson’s 1999 invited paper, which won the Outstanding Article award from The Wildlife Society. Particularly noteworthy are the comments by the new editor of the Journal of Wildlife Management, Dr. Leonard Brennan, in the January 2001 issue (p. 172), in which he recommended that prospective authors “Focus on establishing a meaningful effect size” and “Avoid excessive use of P-values”. Drs. David Anderson, Ken Burnham, and Doug Johnson have been (and continue to be) important drivers for these changes within the field of wildlife biology.

References

1. Altman, D. G. 1985. Discussion of Dr. Chatfield's paper. Journal of the Royal Statistical Society, Series A 148:242.

2. Altman, D. G. S. M. Gore, M. J. Gardner, and S. J. Pocock. 1983. Statistical guidelines for contributors to medical journals. British Medical Journal 286:1489-1493.

3. Amery, W. K., M. Hoing, M. Debroye, and F. Dom. 1987. Some comments on the use of statistics in the evaluation of drug trials in migraine. Neuroepidemiology 6:220-227.

4. Anderson, D. R., K. P. Burnham, and W. L. Thompson. 2000. Null hypothesis testing: problems, prevalence, and an alternative. Journal of Wildlife Management 64:912-923.

5. Anderson, W. T. 1992. Trouble in paradigms: robobuyer versus the blob - part 2. Marketing and Research Today 20(2):87-94.

6. Anscombe, F. J. 1956. Discussion on Dr. David's and Dr. Johnson's Paper. Journal of the Royal Statistical Society, Series B 18:24-27.

7. Bailar, J. C., and F. Mosteller. 1992. Guidelines for statistical reporting in articles for medical journals: amplifications and explanations. Pages 313-331 in J. C. Bailar and F. Mosteller, eds. Medical uses of statistics. Second ed. New England Journal of Medicine Books, Boston, Mass.

8. Bakan, D. 1966. The test of significance in psychological research. Psychological Bulletin 66:423-437.

9. Bakan, D. 1967. On method: toward a reconstruction of psychological investigation. Jossey-Bass, Inc., San Francisco, Calif. 178pp.

10. Bandt, C. L., and J. R. Boen. 1972. A prevalent misconception about sample size, statistical significance, and clinical importance. Journal of Periodontics 43:181-183.

11. Barnard, G. A. 1992. Statistics and OR - some needed interactions. Journal of the Operational Research Society 43:787-795.

12. Barndorff-Nielsen, O. 1977. Discussion of D. R. Cox's paper. Scandinavian Journal of Statistics 4:67-69.

13. Beaven, E. S. 1935. Discussion on Dr. Neyman's Paper. Journal of the Royal Statistical Society, Supplement 2:159-161.

14. Beck-Bornholdt, H.-P., and H.-H. Dubben. 1994. Potential pitfalls in the use of p-values in the interpretation of significance levels. Radiotherapy and Oncology 33:177-178.

15. Becker, G. 1991. Alternative methods of reporting research results. American Psychologist 46:654-655.

16. Bellhouse, D. R. 1993. Invited commentary: p values, hypothesis tests, and likelihood. American Journal of Epidemiology 137:497-499.

17. Berg, A. O. 1979. Some non-random views of statistical significance. Journal of Family Practice 8:1011-1014.

18. Berger, J. O. 1986. Are P-values reasonable measures of accuracy? Pages 21-27 in I. S. Francis, B. F. J. Manly, and F. C. Lam, eds. Pacific Statistical Congress. Elsevier Science Publ. Co., New York, N.Y.

19. Berger, J. O., and D. A. Berry. 1988. Statistical analysis and the illusion of objectivity. American Scientist 76:159-165.

20. Berger, J. O., and T. Sellke. 1987. Testing a point null hypothesis: the irreconcilability of P values and evidence. Journal of the American Statistical Association 82:112-122.

21. Berkson, J. 1938. Some difficulties of interpretation encountered in the application of the chi-square test. Journal of the American Statistical Association 33:526-536.

22. Berkson, J. 1942. Tests of significance considered as evidence. Journal of the American Statistical Association 37:325-335.

23. Berry, G. 1986. Statistical significance and confidence intervals. Medical Journal of Australia 144:618-619.

24. Binder, A. 1963. Further considerations on testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review 70:107-115.

25. Blalock, H. M., Jr. 1972. Social statistics. Second ed. McGraw-Hill, New York, N.Y.

26. Boardman, T. J. 1994. The statistician who changed the world: W. Edwards Deming, 1900-1993. American Statistician 48:179-187.

27. Borenstein, M. 1994. A note on the use of confidence intervals in psychiatric research. Psychopharmacology Bulletin 30:235-238.

28. Borenstein, M. 1994. The case for confidence intervals in controlled clinical trials. Controlled Clinical Trials 15:411-428.

29. Borenstein, M. 1997. Hypothesis testing and effect size estimation in clinical trials. Annals of Allergy, Asthma, & Immunology 78:5-16.

30. Borenstein, M. 1998. The shift from significance testing to effect size estimation. In N. Schooler, editor. Comprehensive clinical psychology. Volume 3: research methods. Pergamon, Oxford, U.K.

31. Boring, E. G. 1919. Mathematical versus scientific significance. Psychological Bulletin 16:335-338.

32. Box, G. E. P. 1983. An apology for ecumenism in statistics. Pages 51-84 in G. E. P. Box, T. Leonard and C. F. Wu, eds. Scientific inference, data analysis, and robustness. Academic Press, Inc., San Diego, Calif.

33. Box, G. E. P., W. G. Hunter, and J. S. Hunter. 1978. Statistics for experimenters: an introduction to design, data analysis, and model building. J. Wiley & Sons, Inc., New York, N.Y. 653pp.

34. Bozdogan, H. 1994. Editor’s general preface. Pages ix-xii in H. Bozdogan, ed. Engineering and scientific applications, Vol. 3. Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, Kluwer Academic Publ., Dordrecht, Netherlands

35. Braithwaite, R. B. 1953. Scientific explanation: a study of the function of theory, probability and law in science. Cambridge University Press, Cambridge, U.K.

36. Braitman, L. E. 1993. Statistical estimates and clinical trials. Journal of Biopharmaceutical Statistics 3:249-256.

37. Branch, M. N. 1999. Statistical inference in behavior analysis: some things significance testing does and does not do. Behavior Analyst 22(2):87-92.

38. Brewer, J. K. 1985. Behavioral statistics textbooks: sources of myths and misconceptions? Journal of Educational Statistics 10:252-268.

39. Browner, W. S., and T. B. Newman. 1987. Are all significant P values created equal? The analogy between diagnostic tests and clinical research. Journal of the American Medical Association 257:2459-2463.

40. Bryan-Jones, J., and D. J. Finney. 1983. On an error in "Instructions to Authors". HortScience 18:279-282.

41. Bryk, A. S., and S. W. Raudenbush. 1988. Heterogeneity of variance in experimental studies: a challenge to conventional interpretations. Psychological Bulletin 104:396-404.

42. Buchanan-Wollaston, H. J. 1935. The philosophic basis of statistical analysis. Journal of the International Council for the Exploration of the Sea 10:249-263.

43. Burnham, K. P., and D. R. Anderson. 1998. Model selection and inference: a practical information-theoretic approach. Springer-Verlag, New York, N.Y. 353pp.

44. Cahan, S. 2000. Statistical significance is not a “Kosher Certificate” for observed effects: a critical analysis of the two-step approach to the evaluation of empirical results. Educational Researcher 29:31-34.

45. Camilleri, S. F. 1962. Theory, probability, and induction in social research. American Sociological Review 27:170-178.

46. Campillo, A. C. 1996. [Erroneous interpretation of p values.] [Spanish] Atencion Primaria 17:221-224.

47. Capone, C. A., Jr., and S. L. Seaman. 1989. Uses and misuses of hypothesis testing. Journal of Business Forecasting Methods and Systems 8:18-27.

48. Carver, R. P. 1978. The case against statistical significance testing. Harvard Educational Review 48:378-399.

49. Carver, R. P. 1993. The case against statistical significance testing, revisited. Journal of Experimental Education 61:287-292.

50. Casella, G. and R. L. Berger. 1987. Rejoinder. Journal of the American Statistical Association 82:133-135.

51. Chatfield, C. 1985. The initial examination of data (with discussion). Journal of the Royal Statistical Society, Series A 148:214-253.

52. Chatfield, C. 1989. Comments on the paper by McPherson. Journal of the Royal Statistical Society, Series A 152:234-238.

53. Chernoff , H. 1986. Comment. American Statistician 40:5-6.

54. Cherry, S. 1998. Statistical tests in publications of The Wildife Society. Wildlife Society Bulletin 26:947-953.

55. Chew, V. 1976. Comparing treatment means: a compendium. HortScience 11:348-357.

56. Chew, V. 1980. Testing differences among means: correct interpretation and some alternatives. HortScience 15:467-470.

57. Cicchetti, D. V. 1998. Role of null hypothesis significance testing (NHST) in the design of neuropsychologic research. Journal of Clinical and Experimental Neuropsychology 20:293-295.

58. Clark, C. A. 1963. Hypothesis testing in relation to statistical methodology. Review of Educational Research 33:455-473.

59. Clark, C. M. 1999. Further considerations of null hypothesis testing. Journal of Clinical and Experimental Neuropsychology 21:283-284.

60. Coats, W. 1970. A case against the normal use of inferential statistical models in educational research. Educational Researcher (June):6-7.

61. Cochran, W. G., and G. M. Cox. 1957. Experimental designs. Second ed. J. Wiley & Sons, Inc., New York, N.Y. 611pp.

62. Cohen, J. 1965. Some statistical issues in psychological research. Pages 95-121 in B. B. Wolman, ed. Handbook of clinical psychology. McGraw-Hill, New York, N.Y.

63. Cohen, J. 1990. Things I have learned (so far). American Psychologist 45:1304-1312.

64. Cohen, J. 1994. The earth is round (p<.05). American Psychologist 49:997-1003.

65. Connolly, R. A. 1991. A posterior odds analysis of the weekend effect. Journal of Econometrics 49:51-104.

66. Cooke, R. W., and A. M. Weindling. 1993. Clinical trials and P values. Pediatrics 92:188-189.

67. Cormack, R. M. 1985. Discussion of Dr. Chatfield's paper. Journal of the Royal Statistical Society, Series A 148:231-233.

68. Cowger, C. D. 1984. Statistical significance tests: scientific ritualism or scientific method? Social Service Review 58:358-372.

69. Cox, D. R. 1958. Some problems connected with statistical inference. Annals of Mathematical Statistics 29:357-372.

70. Cox, D. R. 1977. The role of significance tests (with discussion). Scandinavian Journal of Statistics 4:49-70.

71. Cox, D. R. 1982. Statistical significance tests. British Journal of Clinical Pharmacology 14:325-331.

72. Cox, D. R. 1986. Some general aspects of the theory of statistics. International Statistical Review 54:117-126.

73. Cox, D. R., and E. J. Snell. 1981. Applied statistics: principles and examples. Chapman and Hall, London, U.K. 189pp.

74. Crane, J. A. 1980. Relative likelihood analysis versus significance tests. Evaluation Review 4:824-842.

75. Cronbach, L. J. 1975. Beyond the two disciplines of scientific psychology. American Psychologist 30:116-127.

76. Cutler, S. J., S. W. Greenhouse, J. Cornfield, and M. A. Schneiderman. 1966. The role of hypothesis testing in clinical trials. Journal of Chronic Diseases 19:857-882.

77. Daniel, L. G. 1998. Statistical significance testing: a historical overview of misuse and misinterpretation with implications for the editorial policies of educational journals. Research in the Schools 5(2):23-32.

78. Daniel, W. W. 1977. Statistical significance versus practical significance. Science Education 61:423-427.

79. Dar, R. 1987. Another look at Meehl, Lakatos, and the scientific practices of psychologists. American Psychologist 42:145-151.

80. Dar, R., R. C. Serlin, and H. Omer. 1994. Misuse of statistical tests in three decades of psychotherapy research. Journal of Consulting and Clinical Psychology 62:75-82.

81. DeGroot, M. H. 1989. Probability and statistics. Addison-Wesley, Reading, Mass.

82. DeLong, J. B., and K. Lang. 1992. Are all economic hypotheses false? Journal of Political Economy 100:1257-1272.

83. Deming, W. E. 1975. On probability as a basis for action. American Statistician 29:146-152.

84. Diamond, G., and J. Forrester. 1983. Clinical trials and statistical verdicts: probable grounds for appeal. Annals of Internal Medicine 98:385-394.

85. Dill, C. V. B. Whittaker, and J. M. Lancaster. 1998. Statistical inference: a comparison of hypothesis testing and estimation. Insight 23(3):79-83.

86. Dixon, P., and T. O’Reilly. 1999. Scientific versus statistical significance. Canadian Journal of Experimental Psychology 53:133-149.

87. Donders, J. 2000. From null hypothesis to clinical significance. Journal of Clinical and Experimental Neuropsychology 22:265-266.

88. Dyer, I. 1998. The significance of statistical significance. Accident and Emergency Nursing 6(2):92-98.

89. Edwards, A. W. F. 1972. Likelihood. Cambridge Univ. Press, Cambridge, U.K.

90. Edwards, W. 1965. Tactical note on the relation between scientific and statistical hypotheses. Psychological Bulletin 63:400-402.

91. Edwards, W. 1995. Number magic, auditing acid and materiality: a challenge for auditing research. Auditing 14:176-187.

92. Edwards, W., H. Lindman, and L. J. Savage. 1963. Bayesian statistical inference for psychological research. Psychological Review 70:193-242.

93. Ellison, A. M. 1996. An introduction to Bayesian inference for ecological research and environmental decision-making. Ecological Applications 6:1036-1046.

94. Erhardt, C. 1959. Statistics, a trap for the unwary. Obstetrics and Gynecology 14:549-554.

95. Evans, S. J. W., P. Mills, and J. Dawson. 1988. The end of the p value? British Heart Journal 60:177-180.

96. Falk, R. 1998. In criticism of the null hypothesis statistical test. American Psychologist 53:798-799.

97. Falk, R., and C. W. Greenbaum. 1995. Significance tests die hard: the amazing persistence of a probabilistic misconception. Theory and Psychology 5:75-98.

98. Favreau, O. E. 1993. Do the Ns justify the means? Null hypothesis testing applied to sex and other differences. Canadian Psychology 34:64-78.

99. Favreau, O. E. 1997. Sex and gender comparisons: Does null hypothesis testing create a false dichotomy? Feminism and Psychology 7:63-81.

100. Feinstein, A. R. 1977. Clinical biostatistics. C. V. Mosby, St. Louis, Mo.

101. Feinstein, A. R. 1978. Clinical biostatistics: stochastic significance, apposite data, and some remedies for the intellectual pollutants of statistical vocabulary. Clinical Pharmaceutical Therapy 22:113-123.

102. Feinstein, A. R. 1985. Clinical epidemiology: the architecture of clinical research. W. B. Saunders Co., Philadelphia, Penn. 812pp.

103. Felson, D. T., J. J. Anderson, and R. F. Meenan. 1990. Time for changes in the design, analysis, and reporting of rheumatoid arthritis clinical trials. Arthritis and Rheumatism 33:140-149.

104. Finney, D. J. 1988. Was this in your statistics textbook? III. Design and analysis. Experimental Agriculture 24:421-432.

105. Finney, D. J. 1989a. Was this in your statistics textbook? VI. Regression and covariance. Experimental Agriculture 25:291-311.

106. Finney, D. J. 1989b. Is the statistician still necessary? Biom. Praxim. 29:135-146.

107. Folger, R. 1989. Significance tests and the duplicity of binary decisions. Psychological Bulletin 106:155-160.

108. Freedman, D., R. Pisani, and R. Purves. 1978. Statistics. Norton Publ. Co., New York, N.Y.

109. Freeman, P. R. 1993. The role of p-values in analysing trial results. Statistics in Medicine 12:1443-1452.

110. Friedman, M. 1988. Money and the stock market. Journal of Political Economy 96:221-239.

111. Friedman, S. B., and S. Phillips. 1981. What’s the difference? Pediatric residents and their inaccurate concepts regarding statistics. Pediatrics 68:644-646.

112. Gardner, M. J., and D. G. Altman. 1986. Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal 292:746-750.

113. Gardner, M. J., and D. G. Altman. 1989. Estimation rather than hypothesis testing: confidence intervals rather than P values. Pages 6-19 in M. J. Gardner and D. G. Altman, eds. Statistics with confidence - confidence intervals and statistical guidelines. British Medical Journal, London, U.K.

114. Gauch Jr., H. G. 1988. Model selection and validation for yield trials with interaction. Biometrics 44:705-715.

115. Geary, R. C. 1947. Testing for normality. Biometrika 34:209-242.

116. Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 1995. Bayesian data analysis. Chapman and Hall, London, U.K. 526pp.

117. Gibbons, J. D., and J. W. Pratt. 1975. P-values: interpretation and methodology. American Statistician 29:20-25.

118. Gigerenzer, G. 1991. From tools to theories: a heuristic of discovery in cognitive psychology. Psychological Review 98:254-267.

119. Gigerenzer, G. 1993. The superego, the ego, and the id in statistical reasoning. Pages 311-339 in G. Keren and C. Lewis, eds. A handbook for data analysis in the behavioral sciences: methodological issues. Lawrence Erlbaum, Hillsdale, N.J.

120. Gigerenzer, G., and D. J. Murray. 1987. Cognition as intuitive statistics. Erlbaum, Hillsdale, N.J. 214pp.

121. Gigerenzer, G., Z. Swijtink, T. Porter, L. Daston, J. Beatty, and L. Kruger. 1989. The empire of chance: how probability changed science and everyday life. Cambridge University Press, Cambridge, U.K.

122. Gill, J. 1999. The insignificance of null hypothesis significance testing. Political Research Quarterly 52:647-674.

123. Gill, M. 1993. The significance of “significance.” Edinburgh Working Papers in Applied Linguistics 4:63-80.

124. Glaser, D. N. 1999. The controversy of significance testing: misconceptions and alternatives. American Journal of Critical Care 8(5):291-296.

125. Glass, G. V., B. McGaw, and M. L. Smith. 1981. Meta-analysis in social research. Sage Publ., Beverly Hills, Calif. 279pp.

126. Gliner, J. A., G. A. Morgan, N. L. Leech, and R. J. Harmon. 2001. Problems with null hypothesis significance testing. Journal of the American Academy of Child and Adolecent Psychiatry 40:250-252.

127. Gold, D. 1958. Comment on “A critique of tests of significance.” American Sociological Review 23:85-86.

128. Gold, D. 1969. Statistical tests and substantive significance. American Sociologist 4:42-46.

129. Goldberger, A.S. 1991. A course in econometrics. Harvard Univ. Press, Cambridge, Mass.

130. Good, I. J. 1983. Good thinking: the foundations of probability and its applications. Univ. Minnesota Press, Minneapolis, Minn.

131. Goodman, S. N. 1992. A comment on replication, p-values, and evidence. Statistics in Medicine 11:875-879.

132. Goodman, S. N. 1993a. P values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. American Journal of Epidemiology 137:485-496.

133. Goodman, S. N. 1993b. Author’s response to “Invited commentary: p values, hypothesis tests, and likelihood”. American Journal of Epidemiology 137:500-501.

134. Goodman, S. N., and R. Royall. 1988. Evidence and scientific research. Journal of Public Health 78:1568-1574.

135. Gore, S. M. 1981. Assessing clinical trials - trial size. British Medical Journal 282:1687-1689.

136. Gower, J. C. 1983. Data analysis: multivariate or univariate and other difficulties. Pages 39-67 in H. Martens and H. Russwarm, Jr., eds. Food research and data analysis. Applied Science, London, U.K.

137. Granger, C. W. J., M. L. King, and H. White. 1995. Comments on testing economic theories and the use of model selection criteria. Journal of Econometrics 67:173-187.

138. Grant, D. A. 1962. Testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review 69:54-61.

139. Graybill, F. A. 1976. Theory and application of the linear model. Duxbury Press, Mass. 704pp.

140. Graybill, F. A., and H. K. Iyer. 1994. Regression analysis: concepts and applications. Duxbury Press, Belmont, Calif. 701pp.

141. Greenfield, M. L. V. H., J. E. Kuhn, and E. M. Wojtys. 1996. Current concepts. A statistics primer. P values: probability and clinical significance. American Journal of Sports Medicine 24:863-865.

142. Greenland, S. 1989. Modeling and variable selection in epidemiologic analysis. American Journal of Public Health 79:340-349.

143. Greenwald, A. G. 1993. Consequences of prejudice against the null hypothesis. Pages 419-448 in G. Keren and C. Lewis, eds. A handbook for data analysis in the behavioral sciences: methodological issues. Lawrence Erlbaum, Hillsdale, N.J.

144. Greenwald, A. G., R. Gonzalez, R. J. Harris, and D. Guthrie. 1996. Effect sizes and p values: what should be reported and what should be replicated? Psychophysiology 33:175-183.

145. Guttman, L. 1977. What is not what in statistics. The Statistician 26:81-107.

146. Guttman, L. 1985. The illogic of statistical inference for cumulative science. Applied Stochastic Models and Data Analysis 1:3-10.

147. Hacking, I. 1965. Logic of statistical inference. Cambridge Univ. Press, Cambridge, U.K. 232pp.

148. Hahn, G. J. 1974. Don’t let statistical significance fool you! Chemtech 4:16-18.

149. Hahn, G. J. 1990. Commentary. Technometrics 32:257-258.

150. Hahn, G. J., and W. Q. Meeker. 1991. Statistical intervals: a guide for practitioners. J. Wiley & Sons, Inc., New York, N.Y. 392pp.

151. Hall, P., and B. Selinger. 1986. Statistical significance: balancing evidence against doubt. Australian Journal of Statistics 28:354-370.

152. Hammond, G. 1996. The objections to null hypothesis testing as a means of analysing psychological data. Australian Journal of Psychology 48:104-106.

153. Hansen, M. H., and W. E. Edwards. 1950. On the important limitation to the use of data from samples. Bulletin de L’Institute International de Statistique, Bern. Pp. 214-219.

154. Harlow, L. L., S. A. Mulaik, and J. H. Steiger, eds. 1997. What if there were no significance tests? Lawrence Erlbaum Associates, Mahwah, N.J. 446pp. (Pro and con)

155. Harris, M. J. 1991. Significance tests are not enough: the role of effect-size estimation in theory corroboration. Theory and Psychology 1:375-382,

156. Hauschke, D., and V. W. Steinijans. 1996. A note on conventional null hypothesis testing in active control equivalence studies. Controlled Clinical Trials 17:347-348.

157. Hays, W. L. 1963. Statistics for psychologists. J. Wiley and Sons, Inc., New York, N.Y. 719pp.

158. Healy, M. J. R. 1978. Is statistics a science? Journal of the Royal Statistical Society, Series A 141:385-393.

159. Healy, M. J. R. 1989. Comments on the paper by McPherson. Journal of the Royal Statistical Society, Series A 152:232-234.

160. Henderson, A. R. 1993. Chemistry with confidence: should Clinical Chemistry require confidence intervals for analytical and other data? Clinical Chemistry 39:929-935.

161. Henkel, R. E. 1976. Tests of significance. Sage Publ., Inc., Beverly Hills, Calif. 92pp.

162. Herrera, C. D. 1996. An ethical argument against leaving psychologists to their statistical devices. Journal of Psychology 130(2):125-130.

163. Hilborn, R. 1997. Statistical essay - statistical hypothesis testing and decision theory in fisheries science. Fisheries 22(10):19-20.

164. Hinkley, D. V. 1987. Comment. Journal of the American Statistical Association 82:128-129.

165. Hodges Jr., J. L., and E. L. Lehmann. 1954. Testing the approximate validity of statistical hypotheses. Journal of the Royal Statistical Society, Series B 16:261-268.

166. Hogben, L. 1957a. Statistical theory. Allen and Unwin, London, U.K.

167. Holmes, C. B., J. S. Kixmiller, and R. K. Larsen. 1989. Statistical versus clinical significance in research with the MMPI. Psychological Reports 64:159-162.

168. Hubbard, R., R. A. Parsa, and M. R. Luthy. 1997. The spread of statistical significance testing of psychology: the case of the Journal of Applied Psychology, 1917-1994. Theory and Psychology 7:545-554.

169. Hubbard, R., and P. A. Ryan. 2000. The historical growth of statisitical significance testing in psychology – and its future prospects. Educational and Psychological Measurement 60:661-681.

170. Huberty, C. J. 1987. On statistical testing. Educational Researcher 16:4-9.

171. Hunter, J. E. 1997. Needed: a ban on the significance test. Psychological Science 8:3-7.

172. Hunter, J. S. 1990. Commentary. Technometrics 32:261.

173. Inman, H. F. 1994. Karl Pearson and R. A. Fisher on statistical tests: A 1935 exchange from Nature. American Statistician 48:2-11.

174. International Committee of Medical Journal Editors. 1991. Uniform requirements for manuscripts submitted to biomedical journals (special report). New England Journal of Medicine 324:424-428.

175. Jamart, J. 1992. Statistical tests in medical research. Acta Oncologica 31:723-727.

176. James, R. 1999. Back to Bayes. Higher Education Review 32:68-72.

177. Jeffreys, H. 1961. Theory of probability. Third ed. Oxford Univ. Press, Oxford, U.K.

178. Jeffreys, W. H. 1995a. On p-values and chance. Journal of Scientific Exploration 9:121-?.

179. Jeffreys, W. H. 1995b. Further comments on p-values and chance. Journal of Scientific Exploration 9:595-?.

180. Jegerski, J. A. 1990. Replication in behavioral research. Journal of Social Behavior and Personality 5(4;Special Issue):37-39.

181. John, I. D. 1992. Statistics as rhetoric in psychology. Australian Psychologist 27:144-149.

182. Johnson, D. H. 1995. Statistical sirens: the allure of nonparametrics. Ecology 76:1998-2000.

183. Johnson, D. H. 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63:763-772.

184. Johnstone, D. 1988. Comments on Oakes on the foundation of statistical inference in the social and behavioral sciences: the market for statistical significance. Psychological Reports 63:319-331.

185. Johnstone, D. J. 1994. A statistical paradox in auditing. Abacus 30:44-49.

186. Johnstone, D. J. 1995. Statistically incoherent hypothesis testing in auditing. Auditing 14:156-175.

187. Jones, B., P. Jarvis, J. A. Lewis, and A. F. Ebbutt. 1996. Trials to assess equivalence: the importance of rigorous methods. British Medical Journal 313:36-39.

188. Jones, D. 1984. Use, misuse, and role of multiple-comparison procedures in ecological and agricultural entomology. Environmental Entomology 13:635-649.

189. Jones, D., and N. Matloff. 1986. Statistical hypothesis testing in biology: a contradiction in terms. Journal of Economic Entomology 79:1156-1160.

190. Jones, L. V. 1955. Statistics and research design. Annual Review of Psychology 6:405-430.

191. Kaiser, H. F. 1960. Directional statistical decisions. Psychological Review 67:160-167.

192. Katzer, J., and J. Sodt. 1973. An analysis of the use of statistical testing in communication research. Journal of Communication 23:251-265.

193. Kelbaek, H. S., T. Gjorup, and J. Hilden. 1990. [Confidence intervals instead of p-values.] [Danish] Ugeskrift for Laeger 152:2623-2628.

194. Kellow, J. T. 1998. Beyond statistical significance tests: the importance of using other estimates of treatment effects to interpret evaluation results. American Journal of Evaluation 19:123-134.

195. Kempthorne, O. 1966. Some aspects of experimental inference. Journal of the American Statistical Association 61:11-34.

196. Kempthorne, O. 1976. Of what use are tests of significance and tests of hypotheses. Communications in Statistics, Series A 5:763-777.

197. Kendall, P. 1970. Note on significance tests. Pages 87-90 in D. E. Morrison and R. E. Henkel, eds. The significance test controversy - a reader. Aldine Publ. Co., Chicago, Ill.

198. Keuzenkamp, H. A., and A. P. Barten. 1995. Rejection without falsification, on the history of testing the homogeneity condition in the theory of consumer demand. Journal of Econometrics 67:103-127.

199. Keuzenkamp, H. A., and J. R. Magnus. 1995. On tests and significance in econometrics. Journal of Econometrics 67:5-24.

200. Keynes, J. M. 1921. A treatise on probability: the collected writings of John Maynard Keynes, VIII. St. Martin’s Press, New York, N.Y.

201. Kirk, R. E. 1996. Practical significance: a concept whose time has come. Educational and Psychological Measurement 56:746-759.

202. Kish, L. 1959. Some statistical problems in research design. American Sociological Review 24:328-338.

203. Krantz, D. H. 1999. The null hypothesis testing controversy in psychology. Journal of the American Statistical Association 44:1372-1381.

204. Krebs, C. J. 1989. Ecological methodology. Harper & Row, New York, N.Y. 704pp.

205. Krueger, J. 2001. Null hypothesis significance testing: on the survival of a flawed method. American Psychologist 56:16-26.

206. Kruskal, W. H. 1978. Significance, tests of. Pages 944-958 in W. H. Kruskal and J. M. Tanur, eds. International encyclopedia of statistics. Free Press, New York, N.Y.

207. Kruskal, W. H. 1980. The significance of Fisher: a review of R. A. Fisher: The Life of a Scientist. Journal of the American Statistical Association 75:1019-1030.

208. Kruskal, W. H., and R. Majors. 1989. Concepts of relative importance in recent scientific literature. American Statistician 43:2-6.

209. Kupfersmid, J. 1988. Improving what is published: a model in search of an editor. American Psychologist 43:635-642.

210. Langman, M. J. S. 1986. Towards estimation and confidence intervals. British Medical Journal 292:716.

211. Leamer, E. E. 1978. Specification searches: ad hoc inference with nonexperimental data. J. Wiley & Sons, New York, N.Y.

212. Lecoutre, B., and J. Poitevineau. 2000. Beyond traditional significance tests: prime time for new publication norms. [French] L’Annee Psychologique 100:683-713.

213. Levine, R. L., and J. E. Hunter. 1983. Regression methodology: correlation, meta-analysis, confidence intervals, and reliability. Journal of Leisure Research 15:323-343.

214. Lindgren, B. R., C. L. Wielinski, and S. M. Finkelstein. 1994. Contrasting clinical and statistical significance within the research setting. Pediatric Pulmonology 18:64-65.

215. Lindley, D. V. 1986. Discussion. The Statistician 35:502-504.

216. Lindsay, R. M. 1995. Reconsidering the status of tests of significance: an alternative criterion of adequacy. Accounting, Organizations and Society 20:35-53.

217. Lipset, S. M., M. A. Trow, and J. S. Coleman. 1970. Statistical problems. Pages 81-86 in D. E. Morrison and R. E. Henkel, eds. The significance test controversy - a reader. Aldine Publ. Co., Chicago, Ill.

218. Little, T. M. 1981. Interpretation and presentation of results. HortScience 16:637-640.

219. Llobell, J. P., J. F. G. Perez, and M. D. F. Navarro. 2000. Statistical significance and replicability of the data. [Spanish] Psicothema 12:408-412.

220. Loebbecke, J. K. 1995. On the use of Bayesian statistics in the audit process. Auditing 14:176-187.

221. Loftus, G. R. 1991. On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology 36:102-105.

222. Loftus, G. R. 1993a. A picture is worth a thousand rho values: on the irrelevance of hypothesis testing in the microcomputer age. Behavioral Research Methods, Instruments, and Computers 25:250-256.

223. Loftus, G. R. 1993b. Editorial comment. Memory and Cognition 21:1-3.

224. Loftus, G. R. 1996. Psychology will be a much better science when we change the way we analyze data. Psychological Science 5:161-171.

225. Loftus, G. R., and M. J. Masson. 1994. Using confidence intervals in within-subject designs. Psychonomic Bulletin & Review 1:476-490.

226. Lykken, D. T. 1968. Statistical significance in psychological research. Psychological Bulletin 70:151-159.

227. MacDonald, R. R. 1997. On statistical testing in psychology. British Journal of Psychology 88:333-349.

228. Mainland, D. 1982. Medical statistics - thinking vs. arithmetic. Journal of Chronic Diseases 35:413-417.

229. Maret, T. J., and R. E. Ziemba. 1997. Statistics and hypothesis testing in biology. Journal of College Science Teaching 26(4):283.

230. Matloff, N. S. 1991. Statistical hypothesis testing: problems and alternatives. Environmental Entomology 20:1246-1250.

231. Matthews, J. N., and D. G. Altman. 1996. Statistical notes. Interaction 2: compare effect sizes not P values. British Medical Journal 313:808.

232. Mawera, G. 1996. A proposal for the reporting of p-values in hypothesis testing and evaluation. Central African Journal of Medicine 42(9):284-285.

233. McBride, G. B., J. C. Loftis, and N. C. Adkins. 1993. What do significance tests really tell us about the environment? Environmental Management 17:423-432.

234. McCall, R. B. 1975. Fundamental statistics of psychology. Second ed. Harcourt, Brace & Jovanovich, New York, N.Y. 406pp.

235. McCloskey, D. N. 1985a. The loss function has been mislaid: the rhetoric of significance tests. American Economic Review 75:201-205.

236. McCloskey, D. N. 1985b. The rhetoric of economics. Univ. Wisconsin Press, Madison, Wisc.

237. McCloskey, D. N. 1995. The insignificance of statistical significance. Scientific American 272(4):32-33.

238. McCloskey, D. N., and S. T. Ziliak. 1996. The standard error of regressions. Journal of Economic Literature 34:97-114.

239. McClure, J., and H. K. Suen. 1994. Interpretation of statistical significance testing: a matter of perspective. Topics In Early Childhood Special Education 14:88-100.

240. McGrath, R. E. 1998. Significance testing: is there something better? American Psychologist 53:796-797.

241. McLean, J. E., and J. M. Ernest. 1998. The role of statistical significance testing in educational research. Research in the Schools 5:15-22.

242. McNemar, Q. 1960. At random: sense and nonsense. American Psychologist 15:295-300.

243. Meehl, P. E. 1967. Theory testing in psychology and physics: a methodological paradox. Philosophy of Science 34:103-115.

244. Meehl, P. E. 1978. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology 46:806-834.

245. Meehl, P. E. 1990a. Appraising and amending theories: the strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry 1:108-141.

246. Meehl, P. E. 1990b. Why summaries of research on psychological theories are often uninterpretable. Psychological Reports 66:195-244.

247. Meehl, P. E. 1997. The problem is epistemology, not statistics: replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. Pages 393-425 In L. L. Harlow, S. A. Mulaik, and J. H. Steiger, editors. What if there were no significance tests? Erlbaum, Mahwah, N.J.

248. Menon, R. 1993. Statistical significance testing should be discontinued in mathematics education research. Mathematics Education Research Journal 5:4-18.

249. Micceri, T. 1989. The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin 105:156-166.

250. Molenaar, W. 1977. I get sick from statistics, or known rules that are not obeyed. [German] Mens en Maatschappij 52:58-71.

251. Moore, D. S., and G. P. McCabe. 1993. Introduction to the practice of statistics. Second ed. W. H. Freeman and Co., New York, N.Y. 854pp.

252. Morrison, D. E. and R. E. Henkel. 1969. Significance tests reconsidered. American Sociologist 4:131-140.

253. Morrison, D. E., and R. E. Henkel. 1970a. Significance tests in behavioral research: skeptical conclusions and beyond. Pages 305-311 in D.E. Morrison and R. E. Henkel, eds. The significance test controversy - a reader. Aldine Publishing Company, Chicago, Ill.

254. Morrison, D. E., and R. E. Henkel, eds. 1970b. The significance test controversy - a reader. Aldine Publishing Company, Chicago, Ill.

255. Morrow, G. R. 1980. Clinical trials in psychosocial medicine: methodologic and statistical considerations. Cancer Treatment Report 64:451-456.

256. Morrow, G. R., P. M. Black, and D. J. Dudgeon. 1991. Advances in data assessment - application to the etiology of nausea reported during chemotherapy, concerns about significance testing, and opportunities in clinical trials. Cancer 67:780-787.

257. Murray, G. D. 1991. Statistical aspects of research methodology. British Journal of Surgery 78:777-781.

258. Murray, L. R. 1995. Reconsidering the status of tests of significance: an alternative criterion of adequacy. Accounting, Organizations, and Society 20:35-??.

259. Navarro, M. D. F., J. P. Llobell, and J. F. G. Perez. 2000. Effect size and statistical significance. [Spanish] Psicothema 12:236-240.

260. Nelder, J. A. 1971. Discussion on papers by Wynn, Bloomfield, O'Neill and Wetherill. Journal of the Royal Statistical Society, Series B 33:244-246.

261. Nelder, J. A. 1985. Discussion of Dr Chatfield's paper. Journal of the Royal Statistical Society, Series A 148:238.

262. Nelson, N., R. Rosenthal, and R. L. Rosnow. 1986. Interpretation of significance levels and effect sizes by psychological researchers. American Psychologist 41:1299-1301.

263. Nester, M. R. 1996. An applied statistician’s creed. Applied Statistics 45:401-410.

264. Neyman, J. 1958. The use of the concept of power in agricultural experimentation. Journal of the Indian Society of Agricultural Statistics 9:9-17.

265. Neyman, J., and E. S. Pearson. 1933. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A 231:289-337.

266. Nickerson, R. S. 2000. Null hypothesis significance testing: a review of an old and continuing controversy. Psychological Methods 5:241-301.

267. Nix, T. W., and J. J. Barnette. 1998. The data analysis dilemma: ban or abandon. A review of null hypothesis significance testing. Research in the Schools 5(2):3-14.

268. Nunnally, J. 1960. The place of statistics in psychology. Educational and Psychological Measurement 20:641-650.

269. Oakes, M. W. 1986. Statistical inference: a commentary for the social and behavioural sciences. J. Wiley & Sons, Inc., Chichester, U.K. 185pp.

270. Ottenbacher, K. J. 1992. Practical significance in early intervention research: from affect to empirical effect. Journal of Early Intervention 16:181-193.

271. Ottenbacher, K. J. 1995. Why rehabilitation research does not work (as well as we think it should). Archives of Physical Medicine and Rehabilitation 76(2):123-129.

272. Pagano, M., and A. Leviton. 1990. File drawers, p-values, and efficacy of drugs. Journal of Clinical Epidemiology 43:1012-1013.

273. Parkhurst, D. F. 1985. Interpreting failure to reject a null hypothesis. Bulletin of the Ecological Society of America 66:301-302.

274. Parkhurst, D. 1990. Statistical hypothesis tests and statistical power in pure and applied science. Pages 181-201 in G. M. von Furstenberg, ed. Acting under uncertainty: multidisciplinary conceptions. Kluwer Academic Publ., Boston, Mass.

275. Pearce, S. C. 1992. Data analysis in agricultural experimentation. II. Some standard contrasts. Experimental Agriculture 28:375-383.

276. Pearson, K. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated systems of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, Series V 1:157-175.

277. Perry, J. N. 1986. Multiple-comparison procedures: a dissenting view. Journal of Economic Entomology 79:1149-1155.

278. Piantadosi, S., N. Saijo, and T. Tamura. 1993. Guidelines for analysis and reporting of clinical trials in oncology. Japanese Journal of Cancer Research 84:929-937.

279. Pocock, S. J., M. D. Hughes, and R. J. Lee. 1987. Statistical problems in the reporting of clinical trials. New England Journal of Medicine 317:426-432.

280. Pollard, P. 1993. How significant is “significance?” Pages 448-460 in G. Keren and C. Lewis, eds. A handbook for data analysis in the behavioral sciences: methodological issues. Lawrence Erlbaum, Hillsdale, N.J.

281. Pollard, P., and J. T. E. Richardson. 1987. On the probability of making Type I errors. Psychological Bulletin 102:159-163.

282. Posavec, E. J. 1998. Toward more informative uses of statistics: alternatives for program evaluators. Evaluation and Program Planning 21:243-254.

283. Pratt, J. W. 1976. A discussion of the question: for what use are tests of hypotheses and tests of significance. Communications in Statistics, Series A 5:779-787.

284. Preece, D. A. 1982. The design and analysis of experiments: what has gone wrong? Utilitas Mathematica 21A:201-244.

285. Preece, D. A. 1984. Biometry in the Third World: science not ritual. Biometrics 40:519-523.

286. Preece, D. A. 1990. R. A. Fisher and experimental design: a review. Biometrics 46:925-935.

287. Quinn, J. F., and A. E. Dunham. 1983. On hypothesis testing in ecology and evolution. American Naturalist 122:602-617.

288. Ramp, W. K., and J. M. Yancey. 1991. P values and their problems. Bone and Mineral 13:163-165.

289. Ranstam, J. 1996. A common misconception about p-values and its consequences. Acta Orthopaedica Scandinavica 67:505-507.

290. Reckhow, K. H., J. T. Clements, and R. C. Dodd. 1990. Statistical evaluation of mechanistic water-quality models. Journal of Environmental Engineering 116:250-268.

291. Rennie, D. 1978. Vive la difference (p < 0.05). New England Journal of Medicine 299:828.

292. Rennie, L. J. 1998. Improving the interpretation and reporting of quantitative research. Journal of Research in Science Teaching 35:237-248.

293. Riopelle, A. J. 2000. Are effect sizes and confidence levels problems for or solutions to the null hypothesis test? Journal of General Psychology 127:198-216.

294. Roberts, H. V. 1976. For what use are tests of hypotheses and tests of significance. Communications in Statistics, Series A 5:753-761.

295. Roberts, H. V. 1990. Applications in business and economic statistics: some personal views. Statistical Science 5:372-402.

296. Roebruck, P. 1984. Explorative statistical analysis and the valuation of hypotheses. Revue d’Epidemiologie et de Sante Publique 32(3-4):181-184.

297. Rosenkrantz, R. D. 1977. Support. Synthese 36:181-193.

298. Rosenthal, R. 1983. Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology 51:4-13.

299. Rosenthal, R. 1991. Cumulating psychology: an appreciation of Donald T. Campbell. Psychological Science 2:217-221.

300. Rosenthal, R. 1992. Effect size estimation, significance testing, and the file-drawer problem. Journal of Parapsychology 56:57-58.

301. Rosenthal, R. 1993. Cumulating evidence. Pages 519-559 in G. Keren and C. Lewis, eds. A handbook for data analysis in the behavioral sciences: methodological issues. Lawrence Erlbaum, Hillsdale, N.J.

302. Rosnow, R. L., and R. Rosenthal. 1988. Definition in interpretation of interaction effects. Psychological Bulletin 105:143-146.

303. Rosnow, R. L., and R. Rosenthal. 1989. Statistical procedures and the justification of knowledge in psychological science. American Psychologist 44:1276-1284.

304. Rothman, K. J. 1978. A show of confidence (editorial). New England Journal of Medicine 299:1362-1363.

305. Rothman, K. J. 1986. Significance questing (editorial). Annals of Internal Medicine 105:445-447.

306. Rothman, K. J. 1988. Modern epidemiology. Little and Brown, Boston, Mass.

307. Rothman, K. J., and A. Yankauer. 1986. Editor’s note (Letters). American Journal of Public Health 76:587-588.

308. Rothstein, H., and M. C. Tonges. 2000. Beyond the significance test in administrative research and policy decisions. Journal of Nursing Scholarship 32(1):65-70.

309. Rozeboom, W. W. 1960. The fallacy of the null hypothesis significance test. Psychological Bulletin 57:416-428.

310. Royall, R. 1986. The effect of sample size on the meaning of significance tests. American Statistician 40:313-315.

311. Royall, R. M. 1997. Statistical evidence: A likelihood paradigm. International Thomson Publishing, New York, N.Y. 192pp.

312. Royall, R. 2000. On the probability of observing misleading statistical evidence. Journal of the American Statistical Association 95:760-768.

313. Rubin, A. 1981. Reexamining the impact of sex on salary: the limits of statistical significance. Social Work Research and Abstracts 17:22.

314. Salsburg, D. S. 1985. The religion of statistics as practiced in medical journals. American Statistician 39:220-223.

315. Savage, I. R. 1957. Nonparametric statistics. Journal of the American Statistical Association 52:331-344.

316. Savage, L. 1962. The foundations of statistical inference: a discussion. J. Wiley & Sons, New York, N.Y.

317. Savitz, D. A. 1993. Is statistical significance testing useful in interpreting data? Reproductive Toxicology 7:95-100.

318. Savitz, D. A., K.-A. Tolo, and C. Poole. 1994. Statistical significance testing in the American Journal of Epidemiology, 1970-1990. American Journal of Epidemiology 139:1047-1052.

319. Sawyer, A. G., and J. P. Peter. 1983. The significance of statistical significance tests in marketing research. Journal of Marketing Research 20:122-133.

320. Scarr, S. 1997. Rules of evidence: a larger context for the statistical debate. Psychological Science 8:16-17.

321. Schervish, M. J. 1996. P values: what they are and what they are not. American Statistician 50:203-206.

322. Schmidt, F. L. 1992. What do data really mean? Research findings, meta-analysis, and cumulative knowledge in psychology. American Psychologist 47:1173-1181.

323. Schmidt, F. L. 1996a. Board of scientific affairs action on significance testing. Industrial-Organizational Psychologist 33:110-111.

324. Schmidt, F. L. 1996b. Statistical significance testing and cumulative knowledge in psychology: implications for training of researchers. Psychological Methods 1:115-129.

325. Schmidt, F. L., and J. E. Hunter. 1995. The impact of data-analysis methods on cumulative research knowledge: statistical significance testing, confidence intervals, and meta-analysis. Evaluation and the Health Professions 18:408-427.

326. Schmidt, F. L., and J. E. Hunter. 1997. Eight common but false objections to the discontinuation of significance testing in the analysis of research data. Pages 37-64 in L. L. Harlow, S. A. Mulaik, and J. H. Steiger, eds. What if there were no significance tests? Lawrence Erlbaum Associates, Mahwah, N.J.

327. Schulman, J. L., M. J. Kupst, and B. G. Suran. 1976. The worship of “p”: significant yet meaningless research results. Bulletin of the Menninger Clinic 40:134-143.

328. Seeman, J. 1973. On supervising student research. American Psychologist 28:900-906.

329. Sellke, T., M. J. Bayarri, and J. O. Berger. 2001. Calibration of p values for testing precise null hypotheses. American Statistician 55:62-71.

330. Selvin, H. 1957. A critique of tests of significance in survey research. American Sociological Review 22:519-527.

331. Selvin, S., and M. C. White. 1993. Description and reporting of statistical methods. American Journal of Infection Control 21(4):210-215.

332. Serlin, R. C. 1993. Confidence intervals and the scientific method: a case for Holm on the range. Journal of Experimental Education 61:350-360.

333. Serlin, R. C., and D. K. Lapsey. 1985. Rationality in psychological research: the good-enough principle. American Psychologist 40:73-83.

334. Serlin, R. C., and D. K. Lapsley. 1993. Rational appraisal of psychological research and the good enough principle. Pages 199-228 in G. Keren and C. Lewis, eds. A handbook for data analysis in the behavioral sciences: methodological issues. Lawrence Erlbaum, Hillsdale, N.J.

335. Shaver, J. P. 1985a. Chance and nonsense: a conversation about interpreting tests of statistical significance, Part 1. Phi Delta Kappan 67:57-60.

336. Shaver, J. P. 1985b. Chance and nonsense: a conversation about interpreting tests of statistical significance, Part 2. Phi Delta Kappan 67:138-141. Erratum, 1986, 67:624.

337. Shaver, J. P. 1993. What statistical significance testing is, and what it is not. Journal of Experimental Education 61:293-316.

338. Shulman, L. S. 1970. Reconstruction of educational research. Review of Educational Research 40:371-393.

339. Signorelli, A. 1974. Statistics: tool or master of the psychologist? American Psychologist 11:221-223.

340. Sim, J., and N. Reid. 1999. Statistical inference by confidence intervals: issues of interpretation and utilization. Physical Therapy 79:186-195.

341. Simon, R. 1986. Confidence intervals for reporting results of clinical trials. Annals of Internal Medicine 105:429-435.

342. Simon, R., and R. E. Wittes. 1985. Methodologic guidelines for reports of clinical trials. Cancer Treatment Reports 69:1-3.

343. Skinner, B. F. 1956. A case history in scientific method. American Psychologist 11:221-223.

344. Skipper Jr., J. K., A. L. Guenther, and G. Nass. 1967. The sacredness of .05: a note concerning the uses of statistical levels of significance in social science. American Sociologist 2:16-18.

345. Slakter, M. J., Y. Wu, and N. S. Suzuki-Slakter. 1991. *, **, ***; statistical nonsense at the .00000 level. Nursing Research 40:248-249.

346. Smith, K. 1983. Tests of significance: some frequent misunderstandings. American Journal of Orthopsychiatry 53:315-321.

347. Smithson, M. J. 1999. Statistics with confidence: an introduction for psychologists. Sage Publications, London, U.K.

348. Snyder, P., and S. Lawson. 1993. Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education 61:334-349.

349. Snyder, P.A., and B. Thompson. 1998. Use of tests of statistical significance and other analytic choices in a school psychology journal: review of practices and suggested alternatives. School Psychology Quarterly 13:335-348.

350. Sohn, D. 1998. Statistical significance and replicability: why the former does not presage the latter. Theory and Psychology 8:291-311.

351. Soric, B. 1989. Statistical “discoveries” and effect-size estimation. Journal of the American Statistical Association 84:608-610.

352. Spielman, S. 1978. Statistical dogma and the logic of significance testing. Philosophy of Science 45:120-135.

353. Steidl, R. J., J. P. Hayes, and E. Schauber. 1997. Statistical power analysis in wildlife research. Journal of Wildlife Management 61:270-279.

354. Sterne, J. A. C., and G. D. Smith. 2001. Sifting the evidence – what’s wrong with significance tests? British Medical Journal 322:226-231.

355. Stevens, S. S. 1968. Measurement, statistics, and the schemapiric view. Science 161:849-856.

356. Street, D. J. 1990. Fisher's contributions to agricultural statistics. Biometrics 46:937-945.

357. Suen, H. K. 1992. Significance testing: necessary but insufficient. Topics in Early Childhood Special Education 12(1):66-81.

358. Summers, L. H. 1991. The scientific illusion in empirical macroeconomics. Scandinavian Journal of Economics 93:129-148.

359. Sutlive, V. H., and D. A. Ulrich. 1998. Interpreting statistical significance and meaningfulness in adapted physical activity research. Adapted Physical Activity Quarterly 15(2):103-118.

360. Taylor, S., and S. Muncer. 2000. Redressing the power and effect of significance. A new approach to an old problem: teaching statistics to nursing students. Nurse Education Today 20:358-364.

361. Thompson, B. 1993. The use of statistical significance tests in research: bootstrap and other alternatives. Journal of Experimental Education 61:361-377.

362. Thompson, B. 1996. AERA editorial policies regarding statistical significance testing: three suggested reforms. Educational Researcher 25:26-30.

363. Thompson, B. 1997a. Editorial policies regarding statistical significance tests: further comments. Educational Researcher 26(5):29-32.

364. Thompson, B. 1997b. Statistical significance testing practices in The Journal of Experimental Education. Journal of Experimental Education 66:75-83.

365. Thompson, B. 1998. Statistical significance and effect size reporting: portrait of a possible future. Research in the Schools 5(2):33-38.

366. Thompson, B. 1999a. Improving research clarity and usefulness with effect size indices as supplements to statistical significance tests. Exceptional Children 65:329-337.

367. Thompson, B. 1999b. Journal editorial policies regarding statistical significance tests: heat is to fire as p is to importance. Educational Psychology Review 11:157-169.

368. Thompson, B. 1999c. If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory and Psychology 9:167-183.

369. Thompson, B. 1999d. Statistical significance tests, effect size reporting, and the vain pursuit of pseudo-objectivity. Theory and Psychology 9:191-196.

370. Thompson, B. 1999e. Why “encouraging” effect size reporting is not working: the etiology of researcher resistance to changing practices. Journal of Psychology 133:133-140.

371. Thompson, B., and T. Vacha-Haase. 2000. Psychometrics is datametrics: the test is not reliable. Educational and Psychological Measurement 60:174-195.

372. Thompson, W. D. 1987. Statistical criteria in the interpretation of epidemiologic data. American Journal of Public Health 77:191-194.

373. Torabi, M. R., and K. Ding. 1998. Selected critical measurement and statistical issues in health education evaluation and research. International Electronic Journal of Health Education 1(1):26-28.

374. Tversky, A., and D. Kahneman. 1971. Belief in the law of small numbers. Psychological Bulletin 76:105-110.

375. Tyler, R. W. 1931. What is statistical significance? Educational Research Bulletin 10:115-118.

376. Tyron, W. W. 1998. The inscrutable null hypothesis. American Psychologist 53:796.

377. Upton, G. J. G. 1992. Fisher's exact test. Journal of the Royal Statistical Society, Series A 155:395-402.

378. Vacha-Haase, T, and J. E. Nilsson. 1998. Statistical significance reporting: current trends and uses in MECD. Measurement and Evaluation in Counseling and Development 31:46-57.

379. Vacha-Haase, T., and B. Thompson. 1998. Further comments on statistical significance tests. Measurement and Evaluation in Counseling and Development 31(1):63-67.

380. Vacha-Haase, T., J. E. Nilsson, D. R. Reetz, T. S. Lance, and B. Thompson. 2000. Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory and Psychology 10:413-425.

381. Vardeman, S. B. 1987. Comment. Journal of the American Statistical Association 82:130-131.

382. Walker, A. M. 1986a. Reporting the results of epidemiologic studies. American Journal of Public Health 76:556-558.

383. Wallis, W. A., and H. V. Roberts. 1956. Statistics: a new approach. MacMillan Publ. Co., New York, N.Y.

384. Wang, C. 1993. Sense and nonsense of statistical inference: controversy, misuse, and subtlety. Marcel Dekker, Inc., New York, N.Y. 256pp.

385. Ward, R. C., J. C. Loftis, and G. B. McBride. 1990. Design of water quality monitoring systems. Van Nostrand Reinhold, New York, N.Y. 231pp.

386. Warren, W. G. 1986. On the presentation of statistical analysis: reason or ritual. Canadian Journal of Forest Research 16:1185-1191.

387. Weinbach, R. W. 1989. When is statistical significance meaningful? A practice perspective. Journal of Sociology and Social Welfare 16:31-37.

388. Wiens, J. A. 1989. The ecology of bird communities. Cambridge University Press, Cambridge, U.K.

389. Wietzman, R. A. 1984. Seven treacherous pitfalls of statistics, illustrated. Psychological Reports 54:355-363.

390. Wilcox, R. R. 1998. How many discoveries have been lost by ignoring modern statistical methods? American Psychologist 53:300-314.

391. Wilson, K. V. 1961. Subjectivist statistics for the current crisis. Contemporary Psychology 6:229-231.

392. Wilson, W. R., and H. Miller. 1964. A note on the inconclusiveness of accepting the null hypothesis. Psychological Review 71:238-242.

393. Wilson, W. R., H. Miller, and J. S. Lower. 1967. Much ado about the null hypothesis. Psychological Bulletin 67:188-197.

394. Wolfowitz, J. 1967. Remarks on the theory of testing hypotheses. New York Statistician 18:439-441.

395. Wonnacott, R. J., and T. H. Wonnacott. 1985. Introductory statistics. Fourth ed. J. Wiley & Sons, Inc., New York, N.Y.

396. Woolson, R. F., and J. C. Kleinman. 1989. Perspectives on statistical significance. Annual Review of Public Health 10:423-440.

397. Wulff, H. R. 1973. Confidence limits in evaluating controlled therapeutic trials. Lancet 2:969-970.

398. Yates, F. 1951. The influence of Statistical Methods for Research Workers on the development of the science of statistics. Journal of the American Statistical Association 46:19-34.

399. Yates, F. 1964. Sir Ronald Fisher and the design of experiments. Biometrics 20:307-321.

400. Yoccoz, N. G. 1991. Use, overuse, and misuse of significance tests in evolutionary biology and ecology. Bulletin of the Ecological Society of America 72:106-111.

401. Young, M. A. 1993. Supplementing tests of statistical significance: variation accounted for. Journal of Speech and Hearing Research 36:644-656.

402. Zeisel, H. 1955. The significance of insignificant differences. Public Opinion Quarterly 17:319-321.

Return to Thompson's Main Page