/docs/statistics/peerreview/ Directory Listing

Directories

Files

  • 2020-jerrim.pdf: ⁠, John Jerrim, Robert de Vries (2020-03-06):

    Peer-review is widely used throughout academia, most notably in the publication of journal articles and the allocation of research grants. Yet peer-review has been subject to much criticism, including being slow, unreliable, subjective and potentially prone to bias. This paper contributes to this literature by investigating the consistency of peer-reviews and the impact they have upon a high-stakes outcome (whether a research grant is funded).

    Analysing data from 4,000 social science grant proposals and 15,000 reviews, this paper illustrates how the peer-review scores assigned by different reviewers have only low levels of consistency (a correlation between reviewer scores of only 0.2). Reviews provided by ‘nominated reviewers’ (i.e. reviewers selected by the grant applicant) appear to be overly generous and do not correlate with the evaluations provided by independent reviewers. Yet a positive review from a nominated reviewer is strongly linked to whether a grant is awarded. Finally, a single negative peer-review is shown to reduce the chances of a proposal being funding from around 55% to around 25% (even when it has otherwise been rated highly).

    [Keywords: peer-review, consistency, grant funding]

  • 2020-aboozar.pdf: “Is Scholarly Refereeing Productive (at the Margin)?”⁠, Aboozar Hadavand, Daniel S. Hamermesh, Wesley W. Wilson

  • 2018-teplitskiy.pdf: ⁠, Misha Teplitskiy, Daniel Acuna, Aïda Elamrani-Raoult, Konrad Körding, James Evans (2018-07-26):

    • Do connections between reviewer and author affect assessment of manuscript validity?
    • Peer reviews from a validity-focused journal show evidence of co-author favoritism.
    • We use reviewer nominations and distant co-authorship ties to explore mechanisms.
    • We argue epistemic differences between scholarly communities affect judgment.

    Professional connections between the creators and evaluators of scientific work are ubiquitous, and the possibility of bias ever-present. Although connections have been shown to bias predictions of uncertain future performance, it is unknown whether such biases occur in the more concrete task of assessing scientific validity for completed works, and if so, how.

    This study presents evidence that connections between authors and reviewers of neuroscience manuscripts are associated with biased judgments and explores the mechanisms driving that effect. Using reviews from 7981 neuroscience manuscripts submitted to the journal PLOS ONE, which instructs reviewers to evaluate manuscripts on scientific validity alone, we find that reviewers favored authors close in the co-authorship network by ~0.11 points on a 1.0–4.0 scale for each step of proximity. PLOS ONE’s validity-focused review and the substantial favoritism shown by distant vs. very distant reviewers, both of whom should have little to gain from nepotism, point to the central role of substantive disagreements between scientists in different professional networks (“schools of thought”).

    These results suggest that removing bias from peer review cannot be accomplished simply by recusing closely connected reviewers, and highlight the value of recruiting reviewers embedded in diverse professional networks.

    [Keywords: peer review, research evaluation, bias, social network, co-authorship, resource allocation]

  • 2018-baldwin.pdf: “Scientific Autonomy, Public Accountability, and the Rise of”Peer Review" in the Cold War United States"⁠, Melinda Baldwin

  • 2017-goldstein.pdf: “Uncertainty and Individual Discretion in Allocating Research Funds”⁠, Anna P. Goldstein, Michael Kearney (backlinks)

  • 2015-lancet.pdf: “Protocol review at The Lancet: 1997–2015”⁠, The Editors of The Lancet

  • 2013-smulders.pdf: “A two-step manuscript submission process can reduce publication bias”⁠, Yvo M. Smulders

  • 2008-gonzalezalvarez.pdf: “Science in the 21stsup> century: social, political, and economic issues”⁠, Juan R. González-Álvarez (backlinks)

  • 1994-gans.pdf: ⁠, Joshua S. Gans, George B. Shepherd (1994-01-01; backlinks):

    The authors asked the world’s leading economists to describe instances in which journals rejected their articles. More than sixty essays, by a broadly diverse group that includes fifteen Nobel Prize winners, indicate that most have suffered publication rejection, often frequently. Indeed, journals have rejected many papers that later became classics. The authors discuss the prize-winners’ experiences, other notable cases, and rejections by John Maynard Keynes when he edited the Economic Journal. Finally, they search in economists’ almost universal experience of rejection for patterns and lessons about the publication process.

  • 1990-horrobin.pdf: ⁠, David F. Horrobin (1990-03-09):

    Peer review can be performed successfully only if those involved have a clear idea as to its fundamental purpose. Most authors of articles on the subject assume that the purpose of peer review is quality control. This is an inadequate answer. The fundamental purpose of peer review in the biomedical sciences must be consistent with that of medicine itself, to cure sometimes, to relieve often, to comfort always. Peer review must therefore aim to facilitate the introduction into medicine of improved ways of curing, relieving, and comforting patients. The fulfillment of this aim requires both quality control and the encouragement of innovation. If an appropriate balance between the two is lost, then peer review will fail to fulfill its purpose.

    …But I think we must take seriously the possibility that we have traded innovation for quality control, not only in medical publishing but throughout medical science. Here is a specific example. My particular historical interest is in the development of psychiatric therapy. There are 5 major types of drugs in use in psychiatry: the neuroleptics, the benzodiazepines, the tricyclic antidepressants and related compounds, the monoamine oxidase inhibitors, and lithium. All 5 classes were discovered prior to 1960. Some new molecular variants have been introduced, but all the original compounds are still extensively used and no major new therapeutic principles have been developed and shown to be effective clinically. This is in spite of the incomparably greater expenditure on research in neurobiology and psychiatry since 1960. [Scott Alexander notes that older psychiatry drugs are also more highly rated by patients⁠.]

    is in some respects the most successful of these 5 classes of compounds. It is the only one that when properly used appears to bring about a true normalization of behavior. Yet modern peer review practices would certainly have blocked its introduction. worked under primitive conditions in a psychiatric hospital in Australia in the period immediately following the Second World War. His animal experiments were crude and would not now be regarded as remotely adequate to justify a trial in humans. Yet more comprehensive and detailed animal studies would have been impossible because of a lack of resources. The article describing his completely uncontrolled clinical observations5 would almost certainly now have been rejected. If that had happened, it is very doubtful whether Cade would have been in a position to do the additional work that would justify publication and lithium would have been lost to medicine. Cade’s originality would probably not have overcome the current emphasis on accuracy and reliability.

    [Horrobin provides an additional 18 examples of work initially suppressed by the publication & grant-making process.]

    …This is by no means a complete list of all the examples of which I am aware of situations in which peer review has delayed, emasculated, or totally prevented the publication and investigation of potentially important findings. The list is extensive enough to demonstrate that, while antagonism to innovation during the peer review process may not be the norm, it is far from being exceptional. The examples I have given, together with the numerous cases of scientific fraud that have been documented in a book [Betrayers of the Truth, Broad & Wade 1983] and described ad nauseam in the pages of Nature and Science, demonstrate that what some might call psychopathology is not rare in the scientific community. Most decent scientists are reluctant to admit this and therefore reluctant to take it into consideration when assessing peer review. There can be little doubt that this wish to believe that all is for the best in the best possible world has led to serious injustice. If the shepherds do not believe that wolves exist, then some of the sheep are going to have a bad time.

    …But I think we must take seriously the possibility that we have traded innovation for quality control, not only in medical publishing but throughout medical science. Here is a specific example. My particular historical interest is in the development of psychiatric therapy. There are 5 major types of drugs in use in psychiatry: the neuroleptics, the benzodiazepines, the tricyclic antidepressants and related compounds, the monoamine oxidase inhibitors, and lithium. All 5 classes were discovered prior to 1960. Some new molecular variants have been introduced, but all the original compounds are still extensively used and no major new therapeutic principles have been developed and shown to be effective clinically. This is in spite of the incomparably greater expenditure on research in neurobiology and psychiatry since 1960. [Scott Alexander notes that older psychiatry drugs are also more highly rated by patients⁠.]

    is in some respects the most successful of these 5 classes of compounds. It is the only one that when properly used appears to bring about a true normalization of behavior. Yet modern peer review practices would certainly have blocked its introduction. worked under primitive conditions in a psychiatric hospital in Australia in the period immediately following the Second World War. His animal experiments were crude and would not now be regarded as remotely adequate to justify a trial in humans. Yet more comprehensive and detailed animal studies would have been impossible because of a lack of resources. The article describing his completely uncontrolled clinical observations5 would almost certainly now have been rejected. If that had happened, it is very doubtful whether Cade would have been in a position to do the additional work that would justify publication and lithium would have been lost to medicine. Cade’s originality would probably not have overcome the current emphasis on accuracy and reliability.

    [Horrobin provides an additional 18 examples of work initially suppressed by the publication & grant-making process.]

    …This is by no means a complete list of all the examples of which I am aware of situations in which peer review has delayed, emasculated, or totally prevented the publication and investigation of potentially important findings. The list is extensive enough to demonstrate that, while antagonism to innovation during the peer review process may not be the norm, it is far from being exceptional. The examples I have given, together with the numerous cases of scientific fraud that have been documented in a book [Betrayers of the Truth, Broad & Wade 1983] and described ad nauseam in the pages of Nature and Science, demonstrate that what some might call psychopathology is not rare in the scientific community. Most decent scientists are reluctant to admit this and therefore reluctant to take it into consideration when assessing peer review. There can be little doubt that this wish to believe that all is for the best in the best possible world has led to serious injustice. If the shepherds do not believe that wolves exist, then some of the sheep are going to have a bad time.

  • 1989-strickland-storynihgrantsprograms.pdf: “The Story of the NIH Grants Programs”⁠, Stephen P. Strickland

  • 1988-kupfersmid.pdf: “635_1.tif”

  • 1976-johnson.pdf: ⁠, Martin U. Johnson (1976; backlinks):

    …even the most proper use of statistics may lead to spurious correlations or conclusions if there are inadequacies regarding the research process itself. One of these sources of error in the research process is related to selective reporting; another to human limitations with regard to the ability to make reliable observations or evaluations. says:

    The most common variant is, of course, the tendency to bury negative results. I only recently became aware of the massive size of this great graveyard for dead studies when a colleague expressed gratification that only a third of his studies ‘turned out’—as he put it. Recently, a second variant of this secret game was discovered, quite inadvertently, by ⁠, when he wrote to 37 authors to ask for the raw-data on which they had based recent journal articles. Wolins found that of the 37 who replied, 21 reported their data to be either misplaced, lost, or inadvertently destroyed. Finally, after some negotiation, Wolins was able to complete 7 re-analyses on the data supplied from 5 authors. Of the 7, he found gross errors in 3—errors so great as to clearly change the outcome of the experiments already reported.

    It should also be stressed that Rosenthal and others have demonstrated that experimenters tend to arrive at results found to be in full agreement with their expectancies, or with the expectancies of those within the scientific establishment in charge of the rewards. Even if some of Rosenthal’s results have been questioned [especially the ‘Pygmalion effect’] the general tendency seems to be unaffected.

    I guess we can all agree upon the fact that selective reporting in studies on the reliability and validity, of for instance a personality test, is a bad thing. But what could be the reason for selective reporting? Why does a research worker manipulate his dead? Is it only because the research worker has a ‘weak’ mind or does there exist some kind of ‘steering field’ that exerts such an influence that improper behavior on the part of the research worker occurs?

    It seems rather reasonable to assume that the editors of professional journals or research leaders in general could exert a certain harmful influence in this connection…There is no doubt at all in my mind about the ‘filtering’ or ‘shaping’ effect an editor may exert upon the output of his journal…As I see it, the major risk of selective reporting is not primarily a statistical one, but rather the research climate which the underlying policy create (“you are ‘good’ if you obtain supporting results; you are”no-good" if you only arrive at chance results").

    …The analysis I carried out has had practical implications for the publication policy which we have stated as an ideal for our new journal: the European Journal of Parapsychology.

  • 1975-johnson.pdf: ⁠, Martin U. Johnson (1975; backlinks):

    The author discusses how to increase the quality and reliability of the research and reporting process in experimental parapsychology. Three levels of bias and control of bias are discussed. The levels are referred to as Model 1, Model 2 and Model 3 respectively.

    1. Model 1 is characterized by its very low level of intersubjective control. The reliability of the results depends to a very great extent upon the reliability of the investigator and the editor.
    2. Model 2 is relevant to the case when the experimenter is aware of the potential risk of making both errors of observation and recording and tries to control this bias. However, this model of control does not make allowances for the case when data are intentionally manipulated.
    3. Model 3 depicts a rather sophisticated system of control. One feature of this model is, that selective reporting will become harder since the editor has to make his decision as regards the acceptance or rejection of an experimental article prior to the results being obtained, and subsequently based upon the quality of the outline of the experiment. However, it should be stressed, that not even this model provides a fool-proof guarantee against deliberate fraud.

    It is assumed that the models of bias and control of bias under discussion are relevant to most branches of the behavioral sciences.

  • 1975-johnson-2.pdf: ⁠, Martin U. Johnson (1975; backlinks):

    This copy represents our first ‘real’ issue of the European Journal of Parapsychology…As far as experimental articles are concerned, we would like to ask potential contributors to try and adhere to the publishing policy which we have outlined in the editorial of the demonstration copy, and which is also discussed at some length in the article: ‘Models of Bias and Control of Bias’ [Johnson 1975a], in this issue. In short we shall try to avoid selective reporting and yet at the same time we shall try to refrain from making our journal a graveyard for all those studies which did not ‘turn out’. These objectives may be fulfilled by the editorial rule of basing our judgment entirely on our impressions of the quality of the design and methodology of the planned study. The acceptance or rejection of a manuscript should if possible take place prior to the carrying out and the evaluation of the results of the study.

  • 2018-pier.pdf

  • 2016-findley.pdf (backlinks)

  • 2015-nyhan.pdf

  • 2005-glymour.pdf

  • 1997-armstrong.pdf

  • 1989-weiss.pdf

  • 1977-mahoney.pdf

  • 1976-rosenthal-experimenterexpectancyeffects-ch3.pdf (backlinks)

  • 1970-walster.pdf