A noise audit of the peer review of a scientific article: a WPOM journal case study
Keywords:peer review, evaluation of scientific journals, research evaluation, decision making, decision noise, making judgements
This study aims to be one of the first to analyse the noise level in the peer review process of scientific articles. Noise is defined as the undesired variability in the judgements made by professionals on the same topic or subject. We refer to evaluative judgements in which experts are expected to agree. This is what happens when we try to judge the quality of a scientific work. To measure noise, the only information needed is to have several judgements made by different people on the same case to analyse their dispersion (what Kahneman et al. call a noise audit). This was the procedure followed in this research. We asked a set of reviewers from the journal WPOM (Working Papers on Operations Management) to review the same manuscript which had been previously accepted for publication in this journal, although the reviewers were unaware of that fact. The results indicated that if two reviewers were used, the probability of this manuscript not being published would be close to 8%, while the probability of it having an uncertain future would be 40% (one favorable opinion and one unfavorable opinion or both suggesting substantial changes). In the case of employing only one reviewer, in 25% of the cases, the audited work would have encountered significant challenges for publication. The great advantage of measuring noise is, once measured, it is usually possible to reduce it. This article concludes by outlining some of the measures which can be put in place by scientific journals to improve their peer review processes.
Álvarez, S.M.; Maheut, J. (2022). Protocol: Systematic literature review of the application of the mul-ticriteria decision analysis methodology in the evaluation of urban freight logistics initiatives. WPOM-Working Papers on Operations Management, 13(2), 86-107. https://doi.org/10.4995/wpom.16780
Ariely, D. (2008). Las trampas del deseo. Cómo controlar los impulsos irracionales que nos llevan al error. Ed. Ariel.
Bedeian, A.G. (2004). Peer review and the social construction of knowledge in the management disci-pline. Academy of Management Learning & Education, 3(2), 198-216. https://doi.org/10.5465/amle.2004.13500489
Belur, J.; Tompson, L.; Thornton, A.; Simon, M. (2021). Interrater reliability in systematic review meth-odology: Exploring variation in coder decision-making. Sociological Methods & Research, 50(2), 837-865. https://doi.org/10.1177/0049124118799372
Benda, W.G.G.; Engels, T.C.E. (2011). The predictive validity of peer review: A selective review of the judgmental forecasting qualities of peers, and implications for innovation in science. International Jour-nal of Forecasting, 27(1), 166-182. https://doi.org/10.1016/j.ijforecast.2010.03.003
Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology, 45(1), 197-245. https://doi.org/10.1002/aris.2011.1440450112
Ernst, E., Saradeth, T., & Resch, K. L. (1993). Drawbacks of peer review. Nature, 363(6427), 296. https://doi.org/10.1038/363296a0
Fiske, D.W.; Fogg, L. (1990). But the reviewers are making different criticisms of my paper: Diversity and uniqueness in reviewer comments. American Psychologist, 45(5), 591-598. https://doi.org/10.1037/0003-066X.45.5.591
Hirst, A.; Altman, D.G. (2012). Are peer reviewers encouraged to use reporting guidelines? A survey of 116 health research journals. PLoS ONE, 7(4), e35621. https://doi.org/10.1371/journal.pone.0035621
LeBreton, J.M.; Senter, J.L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11(4), 815-852. http://orm.sagepub.com/cgi/content/abstract/11/4/815 https://doi.org/10.1177/1094428106296642
Kahneman, D. (2012). Pensar rápido, pensar despacio. Ed. Debate.
Kahneman D.; Rosenfield A.M.; Gandhi L.; Blaser T. (2016). Noise: How to overcome the high, hidden cost of inconsistent decision making. Harvard Business Review, 94(10), 38-46.
Kahneman, D.; Sibony, O.; Sunstein, C.R. (2021). Ruido. Un fallo en el juicio humano. Ed. Debate.
Krippendorff, K. (2011). Computing Krippendorff's alpha-reliability. Retrieved from https://repository.upenn.edu/asc_papers/43
Marin-Garcia, J.A.; Santandreu-Mascarell, C. (2015). What do we know about rubrics used in higher education? Intangible Capital, 11(1), 118-145. https://doi.org/10.3926/ic.538
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Annals of Internal Medicine, 151(4), 264-269, https://doi.org/10.7326/0003-4819-151-4-200908180-00135
Rezaei, A.R.; Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. As-sessing Writing, 15(1), 18-39. https://doi.org/10.1016/j.asw.2010.01.003
Voskuijl, O.F.; Van Sliedregt, T. (2002). Determinants of interrater reliability of job analysis: A meta-analysis. European Journal of Psychological Assessment, 18(1), 52-62. https://doi.org/10.1027//1015-5722.214.171.124
Weller, A.C. (2001). Editorial peer review: its strengths and weaknesses. Ed. American Society for In-formation Science and Technology.
How to Cite
Copyright (c) 2023 Tomas Bonavia, Juan A. Marin-Garcia
This work is licensed under a Creative Commons Attribution 4.0 International License.
This journal is licensed under a Creative Commons Attribution 4.0 International License.