AI generated science papers
One study published this week has demonstrated that ChatGPT, as well as being more than capable of generating homework indistinguishable from that of a real student, can even pen scientific content to a standard that allows it to go undetected as computer-made. Catherine Gao is a critical care physician at Northwestern University with a side hustle in machine learning research. She saw what ChatGPT was capable of and wondered how it would do at writing science abstracts - the summaries of the results of study papers published in journals. It blew her expectations out of the water and compelled her to systematically deduce just how indistinguishable it was from science written by real scientists…
Catherine - So one, we wondered if it would set off plagiarism detectors. The ChatGPT abstracts performed very well. They scored on average 100% original. So really not using any plagiarism in the traditional sense.
Chris - What that's telling you is that ChatGPT is not just going to some source online and grabbing wholesale that abstract and regurgitating it. It's generating content that is not in existence anywhere else.
Catherine - That's exactly right. It's really writing these abstracts from scratch. Another task that we looked at was whether or not it would score sort of high on different types of detectors. They exist online, these AI output detectors that you can run texts through. And so the real abstracts all scored very, very low. Most of them scored 0.02%, fake. Whereas the majority of the ChatGPT abstracts scored much higher with a median score of 99.98% fake.
Chris - So a machine can spot another machine's work.
Catherine - Yes, exactly. Using the machines to detect the machines. That's very right.
Chris - What about though, if you then thrust those results that it had generated in front of not a machine but people.
Catherine - Like I said, I was really impressed at how good it is. So we gave collections of 25 abstracts that were a mixture of real abstracts and generated abstracts to different team members. So these are all members who are within our biomedical sciences lab. So they're used to engaging with science. We said some of these are real, some of these are generated. Please give us a binary score of which one you think this is, and then you're also welcome to give us some notes to what made you think one way or the other. Even knowing that there are generated abstracts in this combined list, our human reviewers were only able to identify generated abstracts 68% of the time. And these were very, very suspicious, skeptical reviewers. They were so suspicious that they even thought 14% of the real abstracts were generated.
Chris - What about the quality of the content? Because you haven't said anything about that yet, whether or not when you looked at what the machine was saying, it was factually accurate?
Catherine - You know, at first I thought maybe it would provide some vague summary that was in the realm, but what we found is that in the generated abstracts ChatGPT actually came up with completely fabricated numbers for their results. Basically reporting full studies that just came out of the ether. What was really surprising to me was that it could hallucinate these numbers and present them in a way that seemed still factually sound enough that a reader might not be able to differentiate that 95% of the abstracts were generated - I think that would be reassuring. 68% is not that good and they even knew some of these abstracts were generated. So I think if someone came across the abstract in the wild, or if they were reviewing stuff, they might not realise that large language models have gotten so good at generating them and probably wouldn't think to think that it could be fake.
Chris - People are also raising concerns about, for instance, the use of tools like this to generate webpage content because, on the web, traffic is everything. Getting people to come to a resource, you throw adverts at them, you make revenue that way, you have a high foot fall site because you are creating content for your webpage. That was the bottleneck because that was where a person had to be involved and that's where money had to be involved.
Catherine - I think it gets to some very interesting questions about where do we go from here. In one way, could this be used in the hands of a responsible scientist to help take the burden off writing which sometimes can be, like you said, one of the bottlenecks of disseminating scientific work. Could it help improve equity across specifically scientists who have to write in a language that's not their own? What worries me also is, what if this technology is used for evil, right? There are these organisations that exist out there called paper mills that are basically generating scientific content for profit now with this technology that's so powerful, that's accessible and free. Could this be used by these nefarious organisations to spam science that's factually incorrect and dangerously convincing?
Chris - Well, could you go a step further and say, I've got a pharmaceutical company, it's not a very good one. It's deceitful and it wants to push a product. So what it does is generate hundreds of papers supporting a drug that it's invented, saying how good it is, encouraging real organisations to buy in, either investors or organisations who want to buy the drug or the product making money for that venture, when in fact it's all founded on fake science?
Catherine - The data that these models are trained on is detailed enough that it even knows the right range of patient cohort sizes to present in the generated results. For example, when we asked ChatGPT to write an abstract about study about diabetes, it included huge, huge numbers of patients beause a lot of patients have diabetes versus when we asked it to write an abstract about monkeypox, which is a much rarer, newer disease, it knew that the numbers needed to be much smaller. So certainly I think in the hands of these more nefarious or ill intended users, it could be a very dangerous technology.