Reproducibility: science's consistency issue

What use are the scientific findings if they can't be reproduced?
15 November 2022
Presented by Chris Smith, James Tytko
Production by Will Tingle.


Headline about cancer


This week, we’re talking about the so-called scientific reproducibility crisis: an alarming sounding study was released earlier this year which concluded that less than one third of breast cancer research papers had reproducible results. So who’s to blame?

In this episode

A woman with her back to the camera in a lab, using scientific equipment

00:57 - Are reproducibility rates a concern?

With 70% of researchers having failed to reproduce another scientist's experiments, how worried should we be?

Are reproducibility rates a concern?
Marcus Munafo, University of Bristol

From the University of Bristol, Marcus Munafo is with us. He’s the chair of the UK Reproducibility Network, which describes itself as “a national peer-led consortium that aims to ensure the UK retains its place as a centre for world-leading research”...

Marcus - In this context, reproducibility means that if you were to run an experiment again, you would get the same results. So one foundation of science, if you like, is that if the findings that we generate are robust, then if we were to run the same experiment again, we should get the same result more or less. Now, that's not always going to be true. There will be random variation in what we measure, so any single failure to reproduce or replicate the results of a single previous experiment doesn't in and of itself mean that that previous finding was false. But, by and large, if the results that we're generating are robust, then they should be replicable/reproducible across different experiments, or the same experiment rather but done by different people at different times in different labs or in the same lab again at a different time.

James - And we've been quoting in the build up to this programme the statistic that 70% of researchers have failed to reproduce other scientists' results. How did all this come to light? Surely when this came to light there was some sort of big response?

Marcus - Well, one of the problems is that this is one of those things that people who do research, who are in academic departments, know about, but hadn't until relatively recently been looked at systematically. So I had that experience myself when I was doing my PhD. I tried to replicate a finding from the published literature you would think was absolutely robust and I failed to do so. And that made me think, "well, maybe I did something wrong." But I was lucky enough to be reassured by a senior academic, "Well, actually that finding is notoriously flaky." Lots of people have that same problem. So it's only relatively recently that people have started to look at this systematically. So a few years ago, in psychology for example, there was the Reproducibility Project Psychology that attempted to reproduce a hundred studies drawn effectively at random from three major psychology journals. And they found that only about 40% of those findings could be reproduced. And that empirical approach to estimate the proportion of research findings that are robust, that can be reproduced or replicated, has now been extended to other fields. And we find very similar results across the range of different disciplines.

James - And Marcus, is this a problem across all sectors of science or are there particular hotspots?

Marcus - Well, we can't say we're certain because we've not looked everywhere, but certainly it seems to be, perhaps ironically, a fairly reproducible finding where people have looked empirically in fields ranging from psychology through to cancer biology, through to economics. That general figure of about 40% of findings being reproducible seems to pan out, but we haven't looked everywhere. There are, I think, examples of fields that have gone further ahead on this journey, if you like, in terms of changing how they do things to ensure the robustness of the findings they generate. So, in genetics for example, there was a period where candidate gene studies (where you look at a single genetic variant in a single gene to see whether it's associated with some outcome, some phenotype) those studies were notoriously unreliable. But then we moved into the era of genome-wide association studies where you look across the whole genome in very large sample sizes, typically across multi-centre consortia with very strict statistical standards for claiming discovery. And those findings are very robust. So there are certain fields where we can learn from those lessons and see whether they can be applied more broadly. But, in general, I think that this is a relatively universal problem because actually the drivers of this problem are to do with the sorts of things that incentivize the ways in which scientists work, for example.

James - So, statistically, how often should this happen in your opinion?

Marcus - Well, this is one of the really difficult questions because we don't really know what the optimal rate of reproducibility should be. So, on the one hand, you would want findings to be robust enough so that if someone else were to run the same experiment, they would get the same findings. On the other hand, we need to push the boundaries of knowledge. We need to take risks, we need to do a certain amount of blue sky research where the findings are not certain. So I don't think it would be optimal for the rate of reproducible findings to be a hundred percent, for example. And it's not clear what that optimal value is. My personal feeling is that where it seems to be at the moment is probably too low, that we could do better than that in terms of ensuring the upfront quality of the research findings that we generate. But there perhaps needs to be a piece of work done or a bit of thought put into exactly what that optimal trade off is.

James - I suppose when I first heard the number of studies that were having difficulties with being reproduced, it made me a bit more concerned than - if you don't mind me saying Marcus - than it sounds like you are. Are we guilty here in the media of sensationalising this a bit? Is it really a reproducibility crisis because it seems less severe than perhaps I would have first anticipated?

Marcus - Well, don't get me wrong, I think there are real reasons to be concerned and I think there are lots of ways in which we can improve the ways in which we work and the environment within which we work, which is I think part of the issue: that the culture that generates the research that we produce has room for improvement. But I'm not a fan of the crisis narrative for a couple of reasons. First of all, I think it's potentially a little bit hyperbolic. I think it overstates the nature of the problem more than the extent of it. And I think it implies that it's a recent phenomenon and that, if we fix it, we can walk away from it and we don't need to worry about it again. I think the issues are actually deeper than that. And what we need to do is think about, more fundamentally, how science has changed and whether or not many of the ways in which we do science, communicate science, need to be updated, and think about how we can move into a mode where we're constantly reflecting on how we work and whether or not there is room for improvement. And evidencing that through research on how we do research, meta research if you like, so that we can always be improving the quality of what we produce by thinking about the ways in which we work and how we produce it. So I do think there are problems. I think there are many things that we can do better and we need to be putting more effort into thinking about embedding those ways of doing things better and evidencing whether or not they have the impact that we intend. But I'm not sure calling it a crisis is particularly helpful because I think that can just be a bit distracting.


Variety of chemotherapy drugs in vials and an IV bottle.

07:47 - Cancer research shows poor reproducibility

What implications such discrepancies could have on researchers' reputations...

Cancer research shows poor reproducibility
Tim Errington, Centre for Open Space

Crisis’ might be a bit of an exaggeration, but it’s still important to understand the effects that reproducibility, or lack of it, can have on science-led sectors like medicine. Tim Errington, from the Centre for Open Science in Virginia, has led an initiative to explore the reproducibility of studies on cancer, the results of which he’s published in the journal eLife.

Tim - So we started this project eight years ago, and the way that we decided to go about testing it was to make sure that we could first start with the original papers. So identifying all the information that we could, trying to work with those original authors to understand exactly the way the original research was done. And then we worked with independent researchers from those labs to see if they could conduct it again. Their key there was just trying to make sure that we could have no reason beforehand that we wouldn't get the exact same results. And we did that over eight years, looking at a variety of different papers that were published in cancer biology.

Chris - So how did you choose those papers? Were those ones that were judged to be really seminal in the field or the kinds of papers that really direct or drive a field in a certain direction, therefore sort of lynchpin findings that everyone else is hanging research on? Or was it just a random selection of "we'll test this, test this, test this" and get someone else to see if they could effectively follow the same recipe to the same result?

Tim - So the approach we took here was to look at using that word impact - being careful there - which is really what you were just getting at. What were the papers, the findings, when we started this that were making and getting the most attention in the research literature? Who were people reading? Who were they downloading and citing the findings of? When we started it, these papers were just published, but we were hunting for ones that were getting the most attention because we thought, exactly what you were saying, that, "well, let's look at these ones because they're the ones that are going to have the broadest implications that will presumably drive those fields forward. So let's see how reproducible they are.

Chris - And when you did that, what was the result? How many of those really high impact or important field driving publications did you manage, with your independent teams, to reproduce the same results from?

Tim - Looking at a variety of measures, it was definitely less than half. It's sub 50% that we found. And so that in itself I think is an interesting thing to look at: the number. What I think is more interesting is also some tidbits in there about what that means. So two big aspects we found were that it was really hard to understand transparency of those findings, right? The data wasn't always shared. Methods - those methodology details were lacking even with talking to the authors. We couldn't always figure this out. And the materials, those reagents that were used weren't always easily available. We couldn't get them from anywhere. So that was one part which was a hard process to even attempt. And then the second one was, the one that sticks out to me more, is that effect size, right? The practical significance of those findings. So compared to those original outcomes, our replications were 85% smaller on average. A large effect size means that it's going to have a practical significance, especially in the cancer biology space versus the smaller effect size that we're finding, which kind of suggests that maybe there's not a practical application for it.

Chris - This is not about finger pointing of course, but different scientists are trained different ways with different motivations in different parts of the world. Did you test that or did you look at just one country's science when you were doing this?

Tim - Yeah, that's a great question. We did not look at a country's science. The approach we took was just what was being published in the literature. What was getting that attention? So the original papers that we had were largely based in North America and Western Europe to be honest. But findings from labs all over, we didn't tease apart this aspect. There are other projects that are trying to do that - look at just a single country and ask, "well, how does that look if we just look at a single country's output?"

Chris - I was wondering whether some countries where scientists are actively incentivized: publish a paper in a top tier journal and you get a year's pay on top of your normal salary, for example. I'm aware, people have told me, that that is the case, for example, in China where the bonuses are huge if you publish in big journals. There's therefore an incentive to make sure that your science punches way above its weight, which could lead to some people exaggerating claims, etc.. What are the implications of this? If it's in the cancer field and you have got results which are 85% better than they should do, let's say, does this mean then that people are potentially being misled about the validity of clinical treatments if they take what someone says they've found and it can't be reproduced?

Tim - I'll answer that two ways, yes and no. So all these early findings do definitely find their way out into the media, into the news, into social media as well as blog posts. It gets outside beyond the science sphere, right? And we know that that can impact behaviour and policy. We as researchers might find the most interesting study that tells us alcohol consumption does XYZ in terms of my cancer prediction, and so I might curb my behaviour or maybe red wine is good for me, so now I curb my behaviour the other way. So I think at first can directly impact the individual themselves. It does obviously even impact at the care level. We know that a lot of these findings, especially when they're ones that are looking at diagnostic markers, for example, can find their way into treating patients, that there are doctors and clinicians that will actually take that evidence and move right with it, rightly so as they should. But the problem is, if it's not going to stand or we don't really understand how reproducible it is, that it might mislead them by accident. I think the last thing, if we really think about where all of this research is going, we're hoping it can find its way out into the public to actually make an impact. And this can actually slow that pipeline down. As we try to move findings, eventually trying to get them out into the public and be some type of intervention or drug or treatment that can actually help improve lives.

Chris - And just on that point, if you are a company and you are buying rights and patents to exploit a technology or a finding, does this mean potentially your view and your shareholders are being misled?

Tim - Yes and no. What I'm seeing and hearing - and it's worth saying this is anecdotal - what I'm getting at is there's hesitancy in terms of what we publish. In light of these findings and other ones, it's too good to be true, and that maybe you should just wait a little bit and get evidence from somebody else so that you don't get tricked the way that you just said. I think there's more hesitancy towards taking this and moving it rapidly into application.

Stacked coins

Why can't scientists replicate results?
Danny Kingsley, Open Access Australasia

What is causing this to occur? According to Danny Kingsley, from the executive committee of Open Access Australasia, the structure of academia, and the pressure on scientists to either publish, or perish, is responsible, as she explained to Will Tingle...

Danny - Publishing these papers is ostensibly about communicating your research: to say, "I did some research and I found something out and this is what I found out." But in reality, the need to publish papers is something that researchers have to do for their careers. So if you can demonstrate that you've done some research in a particular area and that people thought it was important enough to write about it elsewhere, then you are more likely to get a grant than somebody who says, "I'm interested in doing this research, but I can't demonstrate that I've ever done any research before in the past."

Will - And does the need to publish skew the types of papers that end up being produced?

Danny - Yes, it does in a couple of ways. So one is there's a pure need for volume. In Australia, there used to be a system which just counted the number of papers that you published, and what was happening in that environment was that the number of papers increased dramatically. And the way that worked was, I might do some research, and so what I do is write four papers based on that research, just taking slightly different angles on the research outcomes that I've done rather than writing one. The other way is that need to try and publish in a journal that has a high journal impact factor. So "fancy pants" journals like Nature or Science that people may have heard of, they have very high impact factors and so they're quite prestigious journals to publish in. So it's very competitive to publish in those journals. The submission rate is much higher than the publication rate. So those sorts of journals have a very high rejection rate. Sometimes 95% of the articles that get submitted to those journals are rejected. So that means that there is an imperative for people who want to get published in those sorts of journals to have novel results: results that are surprising, and that unfortunately can mean that there are some poor practices on behalf of the people writing the work to make their results seem more novel. And sometimes it's fairly benign. It might be simply, "oh, that's a bit of an outlier. I won't mention that outlier because it actually makes it look slightly less interesting or less novel." But there are other times where it can be more problematic, which is things like what is called HARK-ing, which is hypothesising after the results are known.

Danny - So instead of saying, "I am seeking to find an answer to this question, do my results look at the data and say, "Yes, that question was validated', or "No, it proved not to be true." Instead I do the research, look at the data and say, "Actually I'm going to say that my question was this other thing because then I can demonstrate with this data that I was right with my hypothesis. And it's worth noting that retractions of papers - when somebody finds there's a problem with a paper and it gets retracted from the record - they tend to happen more often in high profile journals than they do in smaller journals, possibly because there are more eyeballs on those journals, but also quite probably because there is this need for novelty. And so that kind of poor practice is potentially more likely with papers that are submitted to those journals in the hope that they get published.

Will - So putting all of this together then, how do these factors all mean that there is a lack of reproducibility?

Danny - So reproducibility is complex. It's very difficult to reproduce exactly the same circumstances in exactly the same environment. So it's not surprising that there are situations where you can't exactly reproduce the outcomes, particularly if you're talking about studies that involve humans or animals because they are obviously going to differ slightly each time. The lack of reproducibility is to do with things like the size of the study and those sorts of issues. But the reason why we are not doing a lot of reproducibility is we're not reproducing work to ensure that it is valid is because that would not get rewarded because it has already been published. So there is no value in reproducing. There's also a risk in trying to reproduce somebody's work if you try and reproduce it and are unable to. You've got to call it then and say, "Professor Jones' work doesn't stand up." And if you are a subordinate to Professor Jones, that could be - what shall we say... career limiting.

Will - And science is hard with all the best intention in the world. You could attempt to reproduce someone's study, but the sheer nebulous amount of parameters involved in every experiment means that something was different that was out of your control.

Danny - It might be that there's a stack of magazines on the machine and that's affected something. It might be something that you don't even realise is affecting the outcome of your results that you haven't put into your methods because you don't think it's relevant, but turns out to be relevant.

Will - Is there need for a better communication of methodology? Because sometimes scientists try and replicate work, but they weren't given the full instructions.

Danny - Yeah, there is actually. It's quite interesting. There's a couple of journals now which are video journals and you video the experimentation process. That is an ability then for you to see the environment you are in. So it does give a different view literally of how the experiment was undertaken. That does allow a different way of communicating. It does of course mean that involves a different way of setting yourself up when you're doing your research process and also the process of editing that and sending it off for publication. There are extra steps associated with that. And of course that then means that's time away from you writing papers that are going to get you the reward. So there is something of a selflessness of the people who are experimenting with this type of thing. But as we make it more normal, then we're going to end up with a better result, literally for them, and for us, our society, in terms of better use of funds of our research, because that often is taxpayer money, and also better outcome for the research process.

Will - We do not wish to alarm people, but how widespread would you say that this problem is?

Danny - There are many, many papers that are not reproducible, for many of the reasons that we are talking about today. But the issue of deliberate fraud and reproducibility because somebody has done the wrong thing deliberately, is very, very minor. We need to understand that science by its nature is questioning itself. It's never finished. So any outcome, any result needs to be built on by others, then reproducing some of that work or taking that idea and building it into something else. So we are always questioning the results in science. That is a normal thing to do. But what we don't want to be doing is questioning the endeavour of science itself.

A thumbs up

21:52 - Solving this scientific hitch

Making sure reproducibility remains a central pillar of the scientific process..

Solving this scientific hitch
Andrew Holding, University of York

So what can we do about it? Will also spoke with biochemist Andrew Holding, who’s at the University of York and has himself had run-ins with irreproducible science from other people that took him a whole post-doc to sort out, about the short and long term solutions that could help scientists to move towards more open and reproducible methods of research and publication...

Andrew - The main win I see for science in general in challenging reproducibility is, as more and more biology research becomes computationally based - so these are technologies like genomics proteomics, many words that end with omics - they use a lot of mathematical methods, and we can publish the code and the data so someone can literally download that and run it on their computer. And that is a massive win for science because that means a person can reproduce the data analysis in an afternoon, maybe a bit longer if it's a little bit challenging to run the code. And then you can see how the thing works. To me, part of the science is the coding. That's a really quick win. If we normalise that behaviour, we can then have people building on that science. We can grow science quicker instead of this idea that we have to keep it hidden and safe in case someone finds a mistake in it because most people aren't producing irreproducible science on purpose.

Andrew - And I think letting people make mistakes and letting people see the workings, how you got your answer, is a huge plus. And I don't think there's any harm in saying, "look, we've been pushing people's papers, their research looked like 10/10 results." Let's just relax and say this is pretty convincing. And then people can honestly show the weakness to their work too. And that is a cultural change which, given the competition in science, is slow to happen, but it is happening. And certainly, on the computational work, the things I mentioned about people giving the code, that's changing a lot more rapidly because that's quite a new field and there's a lot more willingness to try new things. Where I think the momentum in wet lab experiments and the established techniques is, you're trying to change something that's been like that for 50 years, and that's a lot harder, but people are still coming round to the benefits of this.

Andrew - So those small wins are there. Let's put aside the absolutely fraudulent people, which is probably the absolute minority of the problem, and look at the genuine mistakes made by honest scientists who want to get the best science. How can we make it so if they make one of those mistakes, the next paper says, "you know what? I think this" and builds on it. That standing on the work that came before you, that science, not presenting a beautifully polished piece of work that meets a set of criteria that we sort of made for ourselves that don't really exist.

Will - And if we want to shift towards a more open access attitude towards research, is there anything that we can do to prevent institutions from just piggybacking on each other's research?

Andrew - This is something you see quite a lot of people saying: "Oh, if I publish my data open access, if I publish my source code open access, someone can just go and run my code and tweak a few parameters and get a paper out of it." And I'm like "great!" And I think that's an attitude change. So you've got to say, yes, people will piggyback on you. And what we can do is say, look, if someone is piggybacking on you, if someone sees your results and because they have better funding than another country and they can get ahead of you, that we don't see that as a bad thing, we see that as something like, this person did something so good, they generated a new field, they generated a new direction of science, and they don't feel that they're going to be vulnerable because of that. And that's something where sometimes because the way grant funding works and the way it is competitive, that people do feel vulnerable to someone getting ahead.

Andrew - In my experience, though, usually people who take what you've done and run with it run in a different direction. It's very rare that someone has exactly the same idea as you with exactly the same data, especially if you are the person who came up with it. And the benefits of being open and sharing and people expanding what you're working on for you, over the risks, I think mean we should embrace this. But that small concern that maybe you'll not have the next grant because someone's scooped you as we call it, I think we can make better protections to recognise that that is a vulnerability and make people feel more secure. But I think the benefits still massively outweigh that risk.

Will - And so to look longer term, how can we therefore ensure that academics have the environment they need to feel safe producing their work?

Andrew - I mean, this is a really complicated one. We've got the funding environment as it is and then we're looking forward to how we do that in the future. And, at the moment, science is funded quite often, certainly for the smaller research groups, we're short term grants, very competitive, very low success rates. Somewhere between 1 and 10% on quite a lot of these funding bodies. What you need to do is say, right, we can fund more science, we can support these people more. And we are not just going to go for people who have the biggest and flashier science that we fund. We fund people for being consistent and reliable. And how we measure that, those metrics, I'm not going to give you an answer today because I don't think we know what the metrics are yet because metrics of measuring how good science is, is such a challenge. But what I can say is, if we decide we want to change the metrics, we've got the people that can do it. You know, scientists spend their lives analysing data and if we can't work out how to work out how to get the outcome we want from science funding, then we are asking the wrong people to be honest because we should be able to do it

Will - To finish off, the last thing we want to do is undermine all the vital research that is done and is beneficial to all of us by scientific research. So do you see this as a crisis or more as an opportunity?

Andrew - I think it's absolutely an opportunity. If we were to ignore it, stick our heads into the sand, it would become a crisis because people would lose faith and people would lose trust in it. What I am seeing is most of these issues are things that have gone wrong because of people making genuine mistakes. If they publish for real data, then people can correct that. And that's how science has always worked. We know plenty of stages in the history of science that have been competing ideas. Sometimes they've gone backwards, sometimes they've gone forwards, but eventually we come up with a model that we build up. And so this is just another evolution of that ongoing scientific process. So I think this is a massive opportunity to say, "Look, we can do science better. We've seen the challenges", and identify using the skills we have as a scientific community where best to put resources to get the best outcome for everyone who is investing in us as scientists. So that could be charities, that can be governments, and they can then see better results and a more diverse set of results that don't just focus on trying to get there first, to get the biggest splash in the newspapers, to get the next pot of cash.


Add a comment