The human genetic architecture of COVID-19

A new study - combining the work of thousands of geneticists - pins down the genes making COVID worse...
13 April 2021

Interview with 

Nathan Pearon, COVID-19 Host Genetics Initiative


The nighttime skyline of Dubai, featuring the Burj Khalifa.


Since March, we’ve been discussing how COVID-19 varies between different people depending on their genes. And back at the start the evidence was patchy. Look how far we’ve come: a recent study combines the work of a couple thousand geneticists, using DNA kindly contributed from millions of people around the world, to pin down which common genetic variations are doing us dirty. The study hasn’t yet been peer-reviewed, but it’s such a large collaboration that we’re going to spend the whole of this programme learning what they’ve found. Phil Sansom spoke to geneticist Nathan Pearson from the COVID-19 Host Genetics Initiative...

Nathan - Our own DNA matters in a lot of other health questions, including infectious diseases. So we have plenty of examples where human genetic variation shapes who gets a given infection, and maybe shapes how severely they get it, et cetera. So we had kind of a hunch going in that like other infectious diseases, this one might play out similarly. And given that that's our expertise, that's our bailiwick, what can we bring to the table along with everyone else - from virologists themselves, to public health scientists, to people studying all facets at every scale of society and our response to it -what can we bring to understanding how our responses vary perhaps in part by the genetic spellings, the DNA, in us?

Phil - And did you have a hunch about how much a role genetics would play?

Nathan - Personally I didn't go in with a strong hunch either way. And I think that people who are more expert in coronaviruses or in viruses generally, in our responses to them, might've gone in with stronger or weaker hunches on that front. But for me, it was sort of an open question. And I think for a lot of our colleagues we felt similarly: that we weren't going to put all our chips on that part of the board, but that we might have something to say. Let me give you an example there. One of the better studied viruses before this one that afflicts people was HIV. And we know, for example, that human genetic variation in a couple of parts of our genome strongly shapes who gets HIV generally, controls the load of that virus over time... it's a very different kind of virus, so we can't extrapolate too much from HIV because it stays in us, it's a retrovirus; but we knew that it played a role in addition to the variation - HIV1, HIV2, et cetera - in the virus itself.

Phil - Right, so what are you looking at here? Because you can't take a person and go, "I'm going to change this part of your DNA and then give you the virus and see what that does." You've got much more of a vague set of data to draw from.

Nathan - Right. Well, vague in some sense, but also quite comprehensive now. We're lucky to be living... in as much as we can be lucky at all during this pandemic, we're lucky to have it happen at a time where we've actually looked at many people's DNA. And one thing we will not be doing, at least in people's bodies right now - we might do it in laboratory cells - but we won't be tinkering with DNA to experiment like that. We don't do that with people's DNA. Rather, we look at the natural experiment of all the splendid variety of DNA among us from the people on the planet today, and see if we can spot patterns in who has particular genetic spellings. That is: if you look at spots on their chromosomes, they're spelled one way, whereas another person's copies of that chromosome are spelled differently. We look to see if we can spot patterns among many people - and luckily we can look at many people now because we've looked at a lot of genomes - to see if anything pops out as an 'aha' pattern that suggests something.

Phil - This is kind of a standard technique in genetics, isn't it? Am I right that it's a 'genome-wide association study'?

Nathan - You hit the acronym there. They'll call it GWAS in our field: genome-wide association studies. It's important to break that down into a couple of different kinds of studies now, because we can look at very different kinds of genetic spellings. The kind we'll talk most about today are modestly common spellings. That is, if you look at many copies of a given human chromosome among us, you'll find that many of them are spelled with a C here, and at the same spot many other copies are spelled, say, a T. So both the C and the T spellings are fairly common among us. Now there are also spellings that are extremely rare. Those very rare spellings may matter a lot in our health, but they're very hard to understand statistically; it's not in enough people to actually statistically answer that question. Instead we'll talk about the common ones. And those common spellings we can look at in many people, and we can say which of them got sick, stayed healthy, if they got exposed to the virus. If they did get sick, did they get severely ill? id they end up in the hospital? Did they end up on a ventilator? Did they die? And when we look at enough people with these common genetic spellings, we can do what we're talking about, which is a genome-wide association study. And we end up with actually a very beautiful picture. We end up with these graphical pictures of our genomes that we call... in kind of a Yank-centric word, we call them Manhattan plots. But what they are, they're plots of our chromosomes. So you basically stretch out each of our long strings of DNA that we call a chromosome, we lay them out end to end, and we look at these common spellings in many people and say, "where do the spellings that a person carries tend to correlate most strongly with the outcome that interests us?" If we're lucky then we get a plot that looks a little bit like a city skyline, where much of it is just pavement, but then you occasionally have this very strong spike that's a very tall kind of building. We call them peaks.

Phil - You can see why they call it a Manhattan plot. I guess you've got one area of interest, that looks like your Empire State Building, and another looks like your Chrysler Building, or something like that.

Nathan - Yeah, and I feel like it would have been more appropriate for Manhattan of the era of the golden Roaring Twenties or something, when it was just the Empire State Building! Manhattan now looks a little bit chaotic. Maybe a more apt city to think of today would be a city like Dubai.

Phil - So flat, flat, flat; and a few really, really big spikes?

Nathan - Yeah, exactly. And to that point we have what looks like the Burj Dubai. For COVID-19, the really tall spike is on the short arm of our third chromosome, chromosome three. When we scan across, we see this really tremendous spike in a very intriguing, but also mysterious spot.

Phil - Chromosome three - I think I speak for most people when I don't associate that with any particular part of the body; you know, it's not a sex chromosome. So what on earth is going on there, that so many people have some version of a gene that is making this big difference?

Nathan - It's a great question. So when we zoom in and look at 'the ground', what the actual letters look like at this spike on chromosome three, we find that it sits in an intron of a gene. Let me break that down for a second. Each gene is basically a recipe, typically for the body to make a protein. The recipe has ingredient bits which we call exons, which are where the body reads, "okay, we're going to mix in two eggs with a cup of flour here," but it also has kind of accompanying text inside the gene that are called introns. And if you think of a recipe analogy, they're kind of like, how do you knit together this part of the recipe that you're making to the next part? So the introns don't have ingredients in them, they're kind of like accompanying text. They're less interesting, frankly, to most geneticists; they're harder to interpret. And, I should say, this particular gene on chromosome three where the spellings are doesn't appear to directly shape our responses to viruses. We have no reason from past studies, or even inklings from this study, to think that this gene matters. But rather, it sits between several other genes that really, really look like very intriguing candidates for how our bodies respond to viruses in particular.

Phil - That's pretty weird! You're saying it's just a bit of accompanying flavourtext, and it's not even part of our virus response; so why on earth is it a big deal?

Nathan - It's a great question! It's actually - and this turns out to often be the case - often the peak in our Manhattan plot for whatever disease we'll look at doesn't actually change the ingredient list for a protein. Rather, it may change how and when our cells make the recipe. What may happen here is that in certain cells, it may be like a bookmark, or a dog ear on the page, that marks this part of our genome to get read in particular cells that respond differently to the virus in some people. Now let me break that down for a second. On one side of this peak there's a very interesting gene called SLC6A20; again, a wonky name, but it makes a protein that controls ions flowing in and out of particular cells. And these SLC proteins can matter in many, many different kinds of diseases. This one has shown up as interesting for some kinds of viral responses, and it's also shown up as being expressed in some lung tissue under certain conditions. So it's intriguing. But equally intriguing, and this is where it gets really mysterious: on the other side of the peak, there are another set of genes that in my book are classic, textbook viral response genes to think about, and they're called chemokine receptors. And we have several of them here. One of them incidentally is already really famous, and that is CCR5. We mentioned other viruses before such as HIV; CCR5 makes a protein that strongly shapes who is vulnerable to getting HIV or not. We don't think that there's a direct connection here, I should say that at the start. Instead we think it might be one of these other chemokine receptors, 1, 3, 9, et cetera, and maybe in the other gene on the other side... that spellings at one or more of those genes may shape which of these proteins get read when, in a way that makes some people significantly more susceptible to the virus than others. To figure out whether that's the case, whether we're right about that, we need to do follow-up experiments in cells in the lab. We're going to put those spellings into cells in a dish, and we're going to, in a simple way, model exposing that dish to the virus. We'll do experiments like that to try to figure out which, if any of these genes, if turned up or down, makes something happened in the dish that looks like what we see happening in people resisting the virus or getting it. And it's important to set expectations here for what we may find, because we can have effects that we find that are very significant - meaning that they robustly show up when we look, if we ran the study again we would find the same Burj Dubai - but the effect, even though significant, is actually quite modest in size. It's a significant effect, but not a strong effect. Let me put that in terms that many of us may find more familiar. So if we looked at weather records over the history of say the UK, where you've had weather observatories for hundreds of years, we would find that it's significantly less likely to rain on a given day in the UK in May or June than it is in April. But as anybody who grew up in Britain knows, it can rain any day, and if you really want to know if it's going to rain today you don't look back and say, "aha, today's a day in June, therefore we're going to have sunny weather!" Instead you look at the local conditions. Those are going to much more strongly shape whether you should pack an anorak. And the same way here: these effects that we're seeing in people's genomes are very significant in study after study, in people around the world, but anybody with any spellings can get the virus and get it badly. I don't say that just to cover my butt; I say it because that's the reality. These are actually very modest effects. Your spellings at this huge peak shape your risk of COVID-19 no more than about twofold over anybody else or less than anybody else. What we're learning here instead is where we can put our efforts most usefully in science to understand the virus; maybe think about developing therapies by knowing what proteins are involved in us; maybe think about fine tuning vaccines, maybe thinking about other questions. They're not for understanding individual risk yet.

Phil - Why don't you bring this to the person level for me? Because you're telling me that you found this bit of 'danger DNA', and you're sure it's danger DNA, but you're telling me that it's actually not that dangerous. So how many people have this bit of DNA, and how big a risk are they at if they've got it?

Nathan - Okay. Almost everybody has this segment of chromosome three in two copies. Different people have different spellings of those copies. The high risk copies... overall on the planet, about one in fourteen of our chromosome three copies have the high risk spellings. They're more or less common in different parts of the world; they turn out to be most common in South Asia. So British folk who have recent South Asian ancestry are more likely to have at least one of the riskier copies, but anybody can have one or two of the risky copies or one or two of the helpful copies for this. Second point: the risky copy may be risky for COVID-19 - modestly - but it might also be helpful for other diseases. We see this a lot in our genomes, where the 'danger version' for one disease turns out to be the helpful version for another disease. I can tell you for CCR5, for example: people who have the resistant version, so they don't catch HIV easily, they actually - it looks like - get West Nile virus more easily than other people do. So nobody's off scot-free, and also nobody here is doomed from this.

Phil - I notice you haven't answered the question! Because I still need to know... and you've given me all these caveats, which are obviously so important, but the niggling question in the back of my head is: if I have this spelling, and maybe I even have two copies of it, I know I shouldn't rely on it and I know there's a million other factors, but how much more risk is there?

Nathan - Oh, okay. So if you have two copies of this particular set of spellings on chromosome three, everything else being equal - and that's a big if - your risk is about twofold more, about double that of some other people. Now please, please remember that I said, "all else equal". We don't know what the other factors are for SARS-CoV-2 except for the virus itself yet. We know that being exposed matters; so to catch it, you really have to be exposed, and if you're safe from exposure, you're not going to get SARS-CoV-2. Overall you're about twofold likelier to catch it if you have those spellings, but what if you have some other yet undiscovered resistance-conferring spelling elsewhere in your genome, or several other ones? That might pull your risk down towards average. Or what if you have a very rare unstudied spelling in another gene, another part of your genome, that may put you at severe risk? That's a good example of the further factors, even in your genome, that will shape your risk of actually catching the virus.

Phil - And this is a twofold higher chance of just catching it, or getting severe disease too?

Nathan - About the same. And so you just teased out another great question that our study was able to address, because we actually looked at both questions. We looked at who catches it; this set of genes matters there. We also looked at who... if you catch it, do you end up in the hospital? Do you get severe symptoms? Do you go downhill a lot? And there are some genes that show up much more strongly for the latter question.

Phil - Oh, okay. So we're going back to our Dubai skyline plot, and we're moving away from the Burj Khalifa - I guess the Burj Dubai, as it used to be called - to maybe some smaller buildings that you found?

Nathan - That's right. So this is where the Dubai plot will thicken. When we look across our genomes, we saw that really big peak, the Burj Khalifa on chromosome three. And that's the one that stood out to us, and we're like, "wow". But when we look across our other chromosomes, we see equally striking but somewhat smaller, shorter buildings, peaks, on several other chromosomes. Those include chromosome 9, 12, and 21 - we'll talk about those - and then there are some even tinier ones when you get down to the level of kind of like city blocks and buildings. But those are the other really prominent peaks. Now the one on chromosome 9 is actually a gene that most of us have heard of, which is the ABO gene: the gene where our different versions define our blood types. Now it turns out that again, significantly, people with type O blood look a little bit more resistant to getting COVID-19 if they're exposed, than other people do. Your blood type doesn't seem to shape how strongly sick you get. By contrast, your spellings on chromosome 3 also shape how strongly sick you get; and the other peaks we want to talk about, so on chromosome 12 and chromosome 21, they also appear to shape how strongly sick you get, and in some cases shape it more strongly perhaps than other parts of our genomes. Those peaks on 12 and 21 also fall near very intriguing genes, that if you asked a virologist, they would likely say, "aha, that makes some sense to me that those genes matter". On chromosome 12 we have a cluster of genes called OAS1, OAS2, and OAS3. These genes make proteins that help our bodies break down double-stranded RNA. A virus like SARS-CoV-2 - its genome is made of RNA, so when it needs to copy itself, it actually makes direct copies of that RNA. And when you make a direct copy, during the time you're copying it, you end up with double-stranded RNA. So these genes that we have called OAS1, 2, and 3, they help our bodies spot this kind of viral RNA; this kind of snapshot, "aha, virus within us is reproducing. Stop it." So a really cool finding on chromosome 12. On chromosome 21, the spellings that matter, the peak that matters in our plot, they fall near a gene called IFNAR2. And again, that kind of robot-name-sounding gene, it makes a protein that's an interferon receptors. So interferons are these proteins that our bodies make to interfere with microbial infection. They help our immune systems go to the rescue and fight germs. So it's not surprising to see our spellings near a gene like IFNAR2 affect how we respond to a particular germ. Now why this particular interferon receptor and not another? Why this particular virus, and maybe not every virus? Those are the open questions that we need to answer through follow up work. And that may take months to decades to figure that out. But those are really cool, intriguing findings that shape our responses. And importantly, those shape the severity response. So IFNAR2 in particular: our spellings there don't appear to really affect much who gets the virus; rather they affect who gets really sick if they got it.

Phil - That's really interesting. So can we just go back over those 12 and 21 again? Chromosome nine had the ABO blood group genes; what did 12 and 21 have? Could you give me one line summaries?

Nathan - Okay, so 12, the spike was in a bit that's near genes that help our bodies break down double-stranded RNA from viruses.

Phil - And 21?

Nathan - And 21, the spike is right near a gene that helps our bodies regulate their interferon response. These interferons are molecules that our bodies make to fight germs, and on chromosome 21 there's a gene that is a receptor for some of those interferons, and helps shape whether they get turned on or work in particular ways.

Phil - So both presumably important parts of the immune response, which might explain why they affect whether you get severe or just mild COVID?

Nathan - That's exactly right. These are central to our immune responses. Now it's worth also noting that our immune systems are incredibly complicated genetically too. There's many, many genes involved in immune response. Throw a dart at a genome randomly, and you'll find an immune relevant gene somewhere nearby. But here we found spikes that are really near very strong clusters of immune response genes, that again, I think a virologist would say, "aha, those are really interesting." By contrast, there are some immune relevant genes we know a lot about that have not shown up. So I think if you asked anybody in our field a year ago, "what human gene is most likely to turn out to be relevant?" Many of them might've said ACE2.

Phil - And in fact, we did a whole program about this very gene.

Nathan - Okay. So good. So you and your listeners may already know about this gene. ACE2 makes a protein that's central to how we regulate blood pressure, first of all. It happens to be the same protein that SARS-CoV-2 and other coronaviruses latch onto with their famous spike proteins in sort of a 'key meets lock' way, to say, "let us into the cell." So far we've looked at a lot of people at the fairly common spellings in ACE2, and it has not turned up as one of our strong places where spellings in us shape who gets the virus or doesn't. That doesn't mean it doesn't matter; it could be that very rare spellings in ACE2 may matter, and there are some studies ongoing to study that. Another set of genes that have not yet shown up as strongly relevant are the genes in the HLA region on chromosome 6. Now these genes are famously diverse. They're mindbogglingly diverse. And that makes it really tough statistically to get a handle on what goes on. They figure a lot in a lot of how we vary immune-wise. So HLA is kind of a 'stay tuned' question. That hunch may hinge on whether COVID-19 is partly an autoimmune-like disease, where our tissues may attack themselves too much after getting infected. So those are some examples, though, of where our own immune biology is complex enough, and yet partly understood enough, that geneticists like me might've gone in with a hunch - and been wrong.

Phil - Well Nathan, let's add all your buildings in your Dubai skyline together for a moment. You've got your chromosome 3 Burj Khalifa; you've got your 9, 12 and 21, which are kind of skyscrapers reaching for that height but not really making it; you've got a bunch of smaller buildings. With everything together, how much of the difference do you think you've explained between why different people get COVID differently?

Nathan - Fairly little yet. And you get a great question, which is how we in our field try to go from one, very blunt, blurry insight... like, people with spellings here on chromosome 3 are twice as likely overall to get severe COVID-19 as other people. But we know that other factors - exposures, and in our genomes, and environments - matter. The strains of the virus will matter, right? Is it B117 or a different strain? And putting those together to get finally predictive for one individual: that's what we want in healthcare, but it's really a far off goal here, even in the best cases. To put it in a very blunt model, you know how our very first pictures of a planet like Mars or Jupiter through a telescope in the 1600s... like Galileo spotted some moons of Jupiter, right? And then it took maybe another a hundred years or so for somebody to spot the great red spot on Jupiter. And then much, much later we can go in with a fine, really good imaging satellite, fly to Jupiter, and get these gorgeous images of all the cloudscape there. We're much closer to Galileo looking at a far off planet right now. And it may not take centuries to get us to that fine picture of SARS-CoV-2 or of other diseases, but it will likely take years at this point to be able to predict it with that kind of resolution of precision that we get with the image we get of a picture of Jupiter today.

Phil - So you don't think it would be a good idea, for example, to make a simple genetic test with all the peaks that you found in your Manhattan plot - as we've been calling our Dubai skyline - and get people to take it, and if they've got all the risky spellings, you say, "okay, you get the vaccine first"?

Nathan - I don't think we're close enough to making that kind of judgment yet. I want to couch that carefully, because there are so many other factors that we know matter more. We know that your age matters far more in terms of your risk if you catch the disease. We know that sex matters; and sex is a genetic question largely, so whether you have X and Y chromosomes together or two X chromosomes. Your age shapes it more; underlying health conditions in your record shape it more. In vaccine rollout I think we've prioritised the more robustly informative factors that we know about first, and that makes sense. Now within a year, will we know enough from a simple genetic model, even if it's still blunt, to say, "aha, this set of people with these three or four genetic spelling characteristics, we should bump them up"? We might get there. We're not there yet, but that is exactly why we're doing this research. We're also doing it so that we know, if there are maybe multiple paths to getting severely sick... so maybe some people get severely sick through effects in the lung; maybe other people get severely sick through effects on the heart or something; those might take different routes genetically that we can start to disentangle. We can see, "ah, the people who need a ventilator because of collapse in particular tissues in the lung, they reliably tend to have this set of genetic risk factors versus other people." That might help do what doctors always want to do, which is called triage; where you take people coming in and you want to divide them into two or three sets for priority access to your care, or a given treatment, resources that are limited, et cetera. And it may also in turn help people in the pharmaceutical world, and elsewhere, who are developing therapies. So can we look at the hits from chromosome 21 or chromosome 3, can that guide us into proteins in our bodies that we should think about developing therapies around, boosting the body's ability to keep this protein working well and fighting the virus? Practically speaking, that's where insights from our work as geneticists will most quickly and reliably find use for everyday people as patients.


Add a comment