AI generated research threatens to pollute the corpus
Interview with
AI can make research faster and more in-depth when used correctly, but AI confabulation like claiming Patrick Moore created the Naked Scientists, could have serious impacts on the quality of scientific literature. We are already seeing studies and review articles written using AI ‘essay mills.’ These are AI generated pieces of writing that also contain completely made-up information. They’ll contain invented references created to support a particular line of argumentation, or “facts” that are just plain wrong. But they look and read so well that the scientific wheat can be hard to separate from the chaff. Some of these studies are definitely slipping through the cracks and getting into journals. In other situations, people are generating articles with AI and then sending them to paid-publication outlets, where others may rely on them without realising that a human didn’t create or check them. Other AIs may also then ingest and regurgitate this same information. This challenges the integrity of published studies and the trust between everyday people and scientific information. To find out how trust in the scientific community can be maintained in the age of AI, here’s Jennifer Wright from the Cambridge University Press, head of publication ethics and research integrity…
Jenny - When you're a scientist you look at someone else's research and you think, is this good? Is this really correct? Is there something that they've missed? Is there something they should have accounted for? Is that method the right approach? And this is essentially what we do today through a process called peer review. So experts in a topic take a paper, they look at it, they grill the work, they critique it and then they determine whether or not it's suitable for publication at all or maybe whether there are some improvements that the author or researcher could make to make it suitable.
Chris - And obviously your venue has a reputation to defend but not all venues do that, do they? Not all publishers operate the way Cambridge does. There are some where I just send them a very large cheque and they publish anything. I've come across these on the internet.
Jenny - Unfortunately this is becoming increasingly challenging. There are so-called 'hijacked' journals which are essentially really sophisticated spoofs of real journals. So let's say you had the ‘Journal of Interesting Research’ and that's a legitimate journal by a legitimate publisher. There might also be another ‘Journal of Interesting Research’ and that is a predatory journal or a hijacked journal and they can be very sophisticated, so researchers and readers might not even realise that they're looking at a predatory journal.
Chris - What's the purpose of those places existing though? Is this people burnishing their CV? So if I want to get some extra publications and make myself look like a better scientist than I am, I can generate some content and send it to one of these journals who ask few questions but will just generate the publication and that turns into CV points for me. Is that the purpose?
Jenny - There are probably a lot of motivations, as with anything like this. It could be naivety, it could be that the researcher doesn't realise this isn't a legitimate outlet, it could be pressure to publish, which you've alluded to, that a lot of academic careers are built on your publication record. So a shortcut to a publication record might be attractive to certain people in certain parts of the world as well, where they have really strict requirements around, for example, you can't graduate until you have a paper, you maybe can't get a promotion until you have a paper. So with that incentive structure it becomes very difficult to resist perhaps.
Jennifer Wright. Bogus research and predatory journals are nothing new, but recent advancements in generative AI have turbocharged the ease with which content can be produced, with plausible papers now popping up at the click of a button. These studies are often littered with confabulations, which unsuspecting readers might mistake for legitimate findings. Here’s Marie Souliere, Head of Editorial Ethics and Quality Assurance with the open access publisher Frontiers…
Marie - The real concerns are about inaccurate or false content, what we refer to as hallucinated content from the artificial intelligence, hallucinated references. If the AI does plagiarism, if there is poor attribution of content without referring to the real source, this is actually what we are really concerned about rather than AI being used for the benefits that it can provide because it is a very efficient and effective tool for analysis of research data. It can free up time for researchers to carry out more research and it really, really supports non-native English speakers, who are the majority of the population of researchers in the world.
Chris - Given that it can churn out very slick, very nice-looking content that's extremely plausible and could therefore deceive reviewers - because unless they're going to check everything in excruciating detail, things could slip through - does that not worry you? That this is going to create a lot of work for reviewers to be able to say, "Honestly, I've really, really checked this and I've checked every reference, I know absolutely this is rock solid.” Which, let's face it, no one's got enough time to do exactly that.
Marie - Agreed and what we've been doing is in a way fighting AI with AI. So there are a lot of tools that have been created to support publishers in the industry and everything is focusing on the root cause of the problem. We cannot detect AI-generated content, it's very difficult and there's very low accuracy for this. So what we do is focus on everything else that might be wrong with the article that could be a symptom of a fake paper. So we're looking at tortured phrases, some problematic content, data that looks a bit dodgy, gibberish images, wrong attributions so that they don't get published and become part of the literature.
Chris - That is the case but there are lots and lots of journals, they don't have the kinds of checks and balances that you have and the problem that stems from that is the AI platforms that people are using and training are just romping their way around the internet, sucking all this stuff up and incorporating it into the model that they then use to generate papers that do go into legitimate sources and reputable venues like those that you publish and so it does have the potential to build up like a layer of plastic on the seafloor and pollute the knowledge space for years to come, doesn't it?
Marie - Absolutely and this is a frequent concern that's raised by researchers and publishers alike. I was at the Frankfurt Book Fair last year at a big event, there was a whole AI day with publishers, people were talking about how these non-peer-reviewed articles in archive or in predatory publishers are taken with the same level of legitimacy as peer-reviewed articles and AIs are not taking this into consideration, or the developers are not, because these articles, the real ones versus the archives, are tagged with a little label that says it's been peer-reviewed and the AIs should be able to make a distinction, but not with the predatory publishers and that's the biggest risk.
Marie Souliere from the Frontiers publishing house. Increased vigilance from publishers, editors and researchers is crucially important to combat this problem, but another way to beat AI pollution of the knowledge pool could mean changes to the way scientists document their work. Here’s Jennifer Wright from Cambridge University Press again…
Jenny - We also promote open science principles at Cambridge, so you can think of this like the show-your-working mode of research. It's not enough to just present the outcome, the researcher needs to be able to kind of show the journey, ideally whilst they're doing it, but a paper trail of: this was my lab work, this is the microscope I used, this is when I went to the field - that's much harder to fabricate than some words on a page. So I think that's something else that builds trust, being able to see that journey. It's also about collaboration across publishers, institutions, funders, libraries, all coming together to look at what can we really do about this pressure-to-publish culture, incentive structures, how can we turn down the tap rather than just filtering the water. So Cambridge is working on this just now. We've got a white paper coming out in the autumn which is going to look at some of these big systemic challenges and what we can actionably do about it.
Comments
Add a comment