Are science papers getting harder to read?

22 November 2017

Interview with

William Thompson, Karolinska Institute

Are science papers getting harder to read? William Thompson tells Chris Smith why the increasing use of scientific jargon is affecting intelligibility.

William: We were four PhD students frustrated about reading different scientific texts. We had journal clubs together and there was specifically one person that we were reading repeatedly that we thought was hard to read. We joked originally, has this person always been hard to read or is this something which is developing as the ideas have developed?
We realised we could quantify this, and as soon as we realised we could quantify it for one person, we realised we could apply the same tools and get a much larger corpus, which was over 700,000 scientific abstracts. So once we realised this, we thought it would be really interesting to work together and make a study together and explore this question.

Chris: So what did you actually do to do this study and what bounds did you set on it?

William: We made a list of 123 highly cited journals from 12 different fields that could be downloaded from PubMed because those were the tools we made to download the abstracts, then we tried to quantify the readability of each abstract. So, for example, the number of words per sentence and the number of syllables per word and tried to make an estimate about how hard it is to read.

Chris: What time frame did this span?

William: The earliest article was 1881, but most of the scientific literature started to appear around 1960 that we could get our hands on and then up to 2015.

Chris: When you run the text from those abstracts through the analysers and ask it to score the language that’s being used, what trend emerges?

William: As I eluded to before, we thought the texts were hard to read and we found a large downward trend in readability, and that means that texts are getting harder to read now compared to previously which wasn’t too surprising for us but we were surprised by how strong the trend was. This trend was very strong downwards.

Chris: You looked at a whole raft of different journals which means that you could consider different scientific disciplines. So, are any disciplines particularly prone to this or is it that all scientists across the board now have a tendency to over complexification and use of long multisyllabic words where simple terms might actually be feasible instead?

William: I think the important take-home message was that all the fields we looked at were getting worse. There were some differences, for example, clinical medicine was the least worst, and molecular biology was the worst in one of the metrics. But I don’t think the emphasis should be placed on that. I think the emphasis should be placed on all fields were getting worse.

Chris: How do you account for this?

William: With the data we had, we tried to explore two possible reasons: one of them was: did the number of authors impact the readability because the number of authors has been growing over time. You often see four or five authors on a paper today, where in 1960 it was one or two. The number of authors does have an impact on the readability so if there’s more authors it’s generally less readable, but that doesn’t explain the trend.

Chris: Do you think it’s a case of ‘too many cooks spoil the broth’ a bit then, if you have multiple authorship? Too many people pulling in too many directions.
William: Exactly. I think we actually had that term in the paper.

Chris: Nonetheless, it’s not a strongest driver because even though you're seeing an effect, there’s still, over and above that, a strong signature of increasing complexity and inaccessibility with time regardless of the authorship number.

William: Yes, exactly. So, an additional hypothesis we had was that scientists may be drawing from an increasingly common vocabulary. We tried to see what were the 3,000 most common words that scientists use and we found that these 3,000 words were increasing overtime. Then we split this list of 3,000 words up into several categories to isolate a category we called “general scientific jargon.” And these words were on the increase. So, scientists are using more of a common scientific language which we called ‘science-ese’, drawing from this vocabulary of general scientific jargon.

Chris: Do you think it really matters though that some papers are a bit impenetrable because some people could argue, “Well, I’m a molecular biologist and it means something to me, and I don’t really care that much if an astrophysicist can’t read my molecular biology paper” and vice versa?

William: I’m very sympathetic to that view but, at the same time, this entire endeavour started when PhD students within the field of neuroscience were having a hard time reading some neuroscience. So it can hamper people within the subject itself. Then, in a much wider context, interdisciplinary fields are very, very important. Otherwise, you can have certain method developments which occur in one field which could be very useful in another field and that cross talk between subjects usually leads to greater scientific progress. So, being completely impenetrable to other disciplines is problematic.

And finally, science is not just for scientists. People should be able to use scientific knowledge to make society better. If the wider parts of society cannot access papers such as scientific journalists or policy makers, if they’re having a hard time interpreting or understanding scientific text, that’s going to mean that science can’t be used as effectively in a wider context.


Add a comment