Overlooked COVID genes

Evidence that there are important genes in COVID that are being ignored in the literature
23 February 2021

Interview with 

Thomas Stroeger, Northwestern University


CGI images of DNA double helix


The pandemic has predictably led to a massive surge in scientific papers on the topic of COVID. Some journals report that their submission rates are up hundreds of percent. But is the science we’re doing comprehensive, or are there stones repeatedly being left unturned? Northwestern University’s Thomas Stoeger has found that this appears to be the case when it comes to the genetics of COVID disease, as he explained to Chris Smith…

Thomas - There is a beautiful resource curated by the National Institutes of Health which, for each publication, tells other scientists, what are the genes that have in this publication. And we filtered this for all the publications around COVID-19.

Chris - Well, there's been a lot of publications hasn't there. I mean, we've never seen a tsunami of work like this on one particular topic all at once. How many papers did that therefore entail? And how many genes did you end up considering?

Thomas - We considered 10000 publications, everything that was out at that time. And we had an eye at all 20000 protein coding human genes, and we spotted roughly 4000 of them in publications.

Chris - Right, so what we've got here is a list of all the genes that keep getting a hit and you know roughly how many genes there are in the human genome, so you can ask how often do each of the genes in the human genome get talked about in the context of coronavirus infection?

Thomas - Absolutely.

Chris - I'll take a guess that the number one hit was the gene for ACE 2 - Angiotensin Converting Enzyme 2, which is the target that the virus uses to bind onto ourselves and get in and infect in the first place. Would that be a reasonable guess?

Thomas - Spot on that's absolutely correct. So this is basically the first place with the most publications.

Chris - And what comes next?

Thomas - It's genes that encode for proteins that signal from one cell to another that there's some virus infection going on.

Chris - How does this help us? And why have you actually done this? Because presumably one could just go and do a look up on a database and see if a particular gene has been looked at in the context of coronavirus and then research it. So what were you actually trying to flush out?

Thomas - We were looking for things that are not expected. So the of top hits are very expected and that is very reassuring because that means scientists do something that makes a lot of sense in the context of COVID-19. But then we got surprised by comparing the rest of the 4,000 genes to COVID-19 physiology. Since the Human Genome Project it's possible to do some experiments that query our genes at once, for instance all the genes that go up or down in the activity following infection with COVID-19. And we considered a few of those experiments. And in total, this is another list of 2000 genes. And we found that those 4000 genes and these 2000 genes rarely overlap. So the top ones kind of overlap, but the majority of the 4000 genes in the COVID-19 literature, they are not represented in these 2000 genes for which there's a very strong evidence that they relate to COVID-19.

Chris - Well, that's quite a striking finding, isn't it? Because what you're saying is that when you ask which genes actually change a lot when someone's infected with coronavirus and then ask, are these the ones we're looking at, you find that with a few rare examples, we're ignoring them.

Thomas - Absolutely.

Chris - Oh dear!

Thomas - But what these genes are really doing in the context of COVID-19 no one knows

Chris - How did we end up sidetracked in this way then, where we focused our attention on 4000 genes, but potentially ignored 2000 others that are doing possibly quite important things and don't overlap with that 4000 we've concentrated on?

Thomas - Science is really difficult and people have found again and again, that scientists at first start revealing those findings that are the least difficult. And along these lines we found that the genes which are being published in the COVID-19 literature are genes which could already be studied very well by approaches that existed in the 1980s and 1990s.

Chris - It's almost like a big game of Scrabble, isn't it? Where once someone starts putting down some letters in one part of the board, everyone then jumps on that bit and makes loads of words in that area. And it means that there are other fertile parts of the board that often get underplayed, and that's sort of what's happened here. Do you think we therefore need more studies like yours when we've got a course of work ongoing like this to highlight fertile areas so people's attention is refocused on things that they may have missed?

Thomas - I believe it's part of the puzzle. And we certainly do hope that studies like ours motivate further research and actually also provide other research starting points for their own exploration.

Chris - What do you think we need to do about this then? Do we need to have a regular program of approaching the genetic space in the way that you have with this paper so that we keep signposting people that overlooked or under appreciated possible bits of the genetic landscape, so that attention is focused back on those areas where that there might be some fruit that can be harvested?

Thomas - Absolutely. It would be my hope. There have been some initiatives starting outside of COVID-19 to explore some of those genes that we know to be important for disease, but no one is following them up. But still these initiatives, they dwarf compared to the bulk of the support that researchers can get elsewhere.


Add a comment