Meet the SARS-CoV-2 family tree

As the coronavirus has left Wuhan to spread around the world, scientists have built its family tree...
15 April 2020

Interview with 

Richard Neher, University of Basel


A map of the world showing hotspots of an infection.


As the coronavirus has left Wuhan to spread around the world, its genes have mutated in tiny ways. By gene sequencing the virus as it goes, and matching up copies wherever the genes are most similar, scientists can build a record of where it has travelled and when. Richard Neher from the University of Basel in Switzerland is one of the people behind Nextstrain, an open source project that’s been responsible for creating and analysing coronavirus ‘family trees’. Phil asked him how - and why - he's done it…

Richard - It really has been a bit of a whirlwind development. So recall that this outbreak was first announced in the very end of December last year. By January 9th, we already had the first complete genome sequence of this virus available. And then over the weeks that came, we got more and more of these genome sequences that came in. By comparing these genome sequences to each other, we immediately got very good sense of that this is a single outbreak. These genomes were essentially identical. There was maybe like, one or two differences here and there, and for an RNA virus was a fairly high mutation rate. That implies that these genomes had a very recent common ancestor only a few weeks in the past.

Phil - And when you say the genomes were coming in, are people sending them to you orwhat are they doing?

Richard - No, and that's an important point. GISAID, which is a database that's used for influenza data sharing.

Phil - GISAID?

Richard - GISAID. The Global Initiative for Sharing All Influential Data. They have jumped in and provided their infrastructure and terms and conditions. Their sort of sharing mechanisms for the coronavirus sequencing community. So that has enabled labs all over the world to share their data for analysis.

Phil - How many genomes and sets of data are they getting?

Richard - We now have more than 2000 full genomes available, and we can't look at all of them at the same time anymore. There's simply so many. It's been the first time that this sort of, real time sequencing, sharing and analysis is playing out.

Phil - Now you said those first few were basically identical. What's happened as the virus has gone all around the world?

Richard - As RNA viruses do, they mutate, not every other day, but about twice a month. That is sort of our current estimate.

Phil - And these aren't mistakes that are going to kill the virus.

Richard - Well some of them will kill the virus, but those are just dead ends, right? So the only ones that we see are those that don't kill the virus. They don't necessarily make the virus more aggressive or anything like that. Most of these mutations likely just don't really have too much of a significance. But what they do allow us to do is group viruses together. You know, a particular virus that got sampled in the US is similar to a virus that got sampled in Europe somewhere. That sort of, gives us an idea how the virus is dispersing and how different outbreaks in different places might be connected.

Phil - So how has this been useful as the virus has gone to continent after continent?

Richard - Early on when there's an outbreak in some country, politicians are very happy to say, well, we closed the borders and problem is solved, right? That has never worked. Surprise. And the sequences, they can tell you that this decision is wrong, right? If you see many sequences that are very similar in your country, they probably were transmitted locally. So this is not a problem that you solved by closing borders, but this is a problem that you have to solve by clamping down on transmission in your community.

Phil - Have you found weird cases where viruses have spread in ways that you haven't expected? And you can tell that from seeing the full genomes of the virus.

Richard - So we've certainly seen, especially in the last week or two within Europe, this viral population is very well mixed. There is not a single place this virus is coming from anymore. And this has to some extent been surprising, how rapid and thorough the spread has been. Anyone can go online and see this, you'll see both a family tree of the virus and a map that shows you where on the planet these samples come from. And one has to be very mindful of the gaps that are in those data, because mutations happen randomly. Sometimes there is no mutation for like four weeks. Sometimes there's three mutations in a week, and this means that seeing two things close together in the tree doesn't necessarily mean they were in the same place in time.

Phil - Now obviously we're in this pandemic for sort of, the long haul. What's the point of all this virus family tree mapping?

Richard - Well most obvious data that we have about this outbreak is the number of cases in different places, but what do these sequences give us? They add structure to these numbers. It's not just sort of, 10,000 cases in New York or something like that. Suddenly you can break this down into multiple variants. So you know you have not one outbreak but you might have three outbreaks that sort of, originate in different places.

Phil - And how is that practically useful?

Richard - It helps you focus infection control measures that you put in place, as you just said we're in this pandemic for a couple of more months for sure. Right. So we'll have to understand, where this virus is transmitted, having the ability to use genome sequences to identify these transmission chains, and transmission clusters gives you means to target infection control measures.



Add a comment