Are reproducibility rates a concern?

With 70% of researchers having failed to reproduce another scientist's experiments, how worried should we be?
15 November 2022

Interview with 

Marcus Munafo, University of Bristol


A woman with her back to the camera in a lab, using scientific equipment


From the University of Bristol, Marcus Munafo is with us. He’s the chair of the UK Reproducibility Network, which describes itself as “a national peer-led consortium that aims to ensure the UK retains its place as a centre for world-leading research”...

Marcus - In this context, reproducibility means that if you were to run an experiment again, you would get the same results. So one foundation of science, if you like, is that if the findings that we generate are robust, then if we were to run the same experiment again, we should get the same result more or less. Now, that's not always going to be true. There will be random variation in what we measure, so any single failure to reproduce or replicate the results of a single previous experiment doesn't in and of itself mean that that previous finding was false. But, by and large, if the results that we're generating are robust, then they should be replicable/reproducible across different experiments, or the same experiment rather but done by different people at different times in different labs or in the same lab again at a different time.

James - And we've been quoting in the build up to this programme the statistic that 70% of researchers have failed to reproduce other scientists' results. How did all this come to light? Surely when this came to light there was some sort of big response?

Marcus - Well, one of the problems is that this is one of those things that people who do research, who are in academic departments, know about, but hadn't until relatively recently been looked at systematically. So I had that experience myself when I was doing my PhD. I tried to replicate a finding from the published literature you would think was absolutely robust and I failed to do so. And that made me think, "well, maybe I did something wrong." But I was lucky enough to be reassured by a senior academic, "Well, actually that finding is notoriously flaky." Lots of people have that same problem. So it's only relatively recently that people have started to look at this systematically. So a few years ago, in psychology for example, there was the Reproducibility Project Psychology that attempted to reproduce a hundred studies drawn effectively at random from three major psychology journals. And they found that only about 40% of those findings could be reproduced. And that empirical approach to estimate the proportion of research findings that are robust, that can be reproduced or replicated, has now been extended to other fields. And we find very similar results across the range of different disciplines.

James - And Marcus, is this a problem across all sectors of science or are there particular hotspots?

Marcus - Well, we can't say we're certain because we've not looked everywhere, but certainly it seems to be, perhaps ironically, a fairly reproducible finding where people have looked empirically in fields ranging from psychology through to cancer biology, through to economics. That general figure of about 40% of findings being reproducible seems to pan out, but we haven't looked everywhere. There are, I think, examples of fields that have gone further ahead on this journey, if you like, in terms of changing how they do things to ensure the robustness of the findings they generate. So, in genetics for example, there was a period where candidate gene studies (where you look at a single genetic variant in a single gene to see whether it's associated with some outcome, some phenotype) those studies were notoriously unreliable. But then we moved into the era of genome-wide association studies where you look across the whole genome in very large sample sizes, typically across multi-centre consortia with very strict statistical standards for claiming discovery. And those findings are very robust. So there are certain fields where we can learn from those lessons and see whether they can be applied more broadly. But, in general, I think that this is a relatively universal problem because actually the drivers of this problem are to do with the sorts of things that incentivize the ways in which scientists work, for example.

James - So, statistically, how often should this happen in your opinion?

Marcus - Well, this is one of the really difficult questions because we don't really know what the optimal rate of reproducibility should be. So, on the one hand, you would want findings to be robust enough so that if someone else were to run the same experiment, they would get the same findings. On the other hand, we need to push the boundaries of knowledge. We need to take risks, we need to do a certain amount of blue sky research where the findings are not certain. So I don't think it would be optimal for the rate of reproducible findings to be a hundred percent, for example. And it's not clear what that optimal value is. My personal feeling is that where it seems to be at the moment is probably too low, that we could do better than that in terms of ensuring the upfront quality of the research findings that we generate. But there perhaps needs to be a piece of work done or a bit of thought put into exactly what that optimal trade off is.

James - I suppose when I first heard the number of studies that were having difficulties with being reproduced, it made me a bit more concerned than - if you don't mind me saying Marcus - than it sounds like you are. Are we guilty here in the media of sensationalising this a bit? Is it really a reproducibility crisis because it seems less severe than perhaps I would have first anticipated?

Marcus - Well, don't get me wrong, I think there are real reasons to be concerned and I think there are lots of ways in which we can improve the ways in which we work and the environment within which we work, which is I think part of the issue: that the culture that generates the research that we produce has room for improvement. But I'm not a fan of the crisis narrative for a couple of reasons. First of all, I think it's potentially a little bit hyperbolic. I think it overstates the nature of the problem more than the extent of it. And I think it implies that it's a recent phenomenon and that, if we fix it, we can walk away from it and we don't need to worry about it again. I think the issues are actually deeper than that. And what we need to do is think about, more fundamentally, how science has changed and whether or not many of the ways in which we do science, communicate science, need to be updated, and think about how we can move into a mode where we're constantly reflecting on how we work and whether or not there is room for improvement. And evidencing that through research on how we do research, meta research if you like, so that we can always be improving the quality of what we produce by thinking about the ways in which we work and how we produce it. So I do think there are problems. I think there are many things that we can do better and we need to be putting more effort into thinking about embedding those ways of doing things better and evidencing whether or not they have the impact that we intend. But I'm not sure calling it a crisis is particularly helpful because I think that can just be a bit distracting.



Add a comment