Genetic tree suggests origin in rural China

Based on the first gene sequences of the coronavirus, it may not have come from Wuhan at all...
01 September 2020

Interview with 

Peter Forster, University of Cambridge


A view of low clouds and mountain ranges in China's Yunnan province.


Wuhan, where the pandemic was first picked up, is a city in the Chinese province of Hubei, which is roughly in the centre of the country. Peter Daszak mentioned earlier two provinces that are in the very south: Guangdong, which is where SARS 1 made the leap into humans; and Yunnan, a rural province where the bats carrying SARS 1 originally came from. These are notable - Guangdong and Yunnan - for reasons that Peter explains...

Peter - For COVID, the wildlife market seems to have been a place where there were lots of people spreading the virus. It doesn't look like that was the actual origin of the virus. It seems that there were some patients that didn't have contact with the market, and they were the first few to be identified. So it looks like it came from somewhere else, it got into the market system, and then spread rapidly in people.

We’ve known since January that the first reported COVID case had no link to the market, and neither did a dozen cases from the initial batch. So if the jump into humans did not happen there - then where? Evidence from the genetic sequence of viruses sampled early in the outbreak indicates that it may not have been Wuhan at all, as Cambridge University’s Peter Forster told Phil Sansom…

Peter - We analysed the first 160 coronavirus genomes, taken mainly from patients in East Asia, but also from the first patients in the Western world, so Australia, Europe, North America. And we applied what we call a network algorithm to reconstruct how the viruses are related to each other; it's like a family tree. And we found at the beginning of March, there were three main types of viruses. We call them A, B and C. And we compared these A/B/C types with the bat coronavirus, which clearly showed that the A type was the ancestral type. And that was a surprise because the A type is not common in the Chinese city of Wuhan. It's the B type that is most common there. Up to then, I had believed that the virus had come from the fish market in Wuhan. And somehow that didn't seem compatible with our analysis.

Phil - I thought there was only one coronavirus though. What do you mean there's A, B and C?

Peter - These viruses mutate all the time, they change. A, B and C differ from each other by mutations: A differs from B by two mutations, which changes an amino acid so the virus now looks slightly different; B has mutated into C by another amino acid change, so the virus, again, it looks a bit different. In the course of March, we've had a quite amazing development: a B subtype, which was about 3% of our sample in early March, now has become the dominant type across the world.

Phil - Then if you get this B type of coronavirus, do you get sick in a different way compared to if you had the original A type, for example?

Peter - An American group looked at a hundred patients who have this new B subtype, which has become dominant, and another a hundred patients who have other coronavirus types. And they saw no major clinical differences. But what they did see was those patients with the new B subtype, they have a higher viral load. And it is immediately obvious then if you have more virus and then you cough and sneeze, you can infect people more easily and therefore this virus type will become dominant. And that is what seems to have happened.

Phil - What does your tree, then, tell you about when these different strains evolve, basically?

Peter - Roughly every two weeks, the virus undergoes one mutation. And if we take a look at how many mutations there are in our reconstructed network of viruses, we can see the ancestral virus started spreading between the 13th of September and the 7th of December. That is what we call the 95% confidence interval.

Phil - That's months before the first reported case.

Peter - Well, I don't think so to be honest, because the first reported case published in the Lancet in January was a patient who fell ill on 1st of December. So that means this patient must have been infected at the end of November. And that is precisely what our time estimate says. There are scientists who have calculated a beginning of the disease in December, mid December or so, but this has because they don't have these network algorithms to calculate accurately where the root type is, how fast the virus is mutating and so forth. But we're talking here about differences of only weeks.

Phil - Okay, that's the when. What about the where? Because you said you were skeptical about wet market theory in Wuhan city...

Peter - I think our results have made me skeptical about Wuhan fish market theory, especially since the first patient diagnosed had no contact with the fish market. Now you may know in January, there was the Chinese New Year's festival - the Chinese celebrate New Year - and people travel. So I decided, "well, let's make a cutoff date for the middle of January and let's see where the A types occur". And for this very early period, we have 23 virus genomes in Wuhan and only three of them are A types. The other 20 are B types. Whereas in other parts of China, you have more A types. So for example, in Guangdong, in Southern China, you have about 50% A types. You have an A type in Yunnan province. These are areas where the bat populations are. And therefore, I think if someone twists my arm and says, where did this virus come from? I think it's slightly more likely it came from the Southern provinces than from Wuhan.

Phil - How sure are you Peter?

Peter - Not sure at all for the origin, because we have such small sample sizes. I mean, I've told you there are 40 genomes available for the period between Christmas and mid January. You can hardly do statistics on such a very small sample size. So I'm not sure.


Add a comment