Predicting outbreaks with the Internet

18 November 2014

Interview with

Sara Del Valle, Los Alamos National Laboratory

Have you ever looked up your symptoms online in a bout of paranoia? Although Wikipedia logothis isn't recommended, you might actually be helping the people who look after our public health.  Experts at Los Alamos in New Mexico have discovered that the number of hits about different illnesses on Wikipedia pages actually predicts when and where outbreaks of disease are happening, long before healthcare services get wind of them. Sara Del Valle explained the process to Kat Arney.

Sara -  The idea behind the study was to see if we could use Wikipedia not only to monitor but also to forecast diseases around the world.  We looked at influenza, which is transmitted from person to person.  We also looked at dengue, which is transmitted from mosquitoes to people and also tuberculosis, which is an airborne disease.

Kat -  What kind of countries were you looking at?

Sara -  We looked at dengue in Brazil and Thailand; influenza in Japan, Poland, Thailand and the United States; tuberculosis in China, Norway and Thailand.  And we looked at Ebola in Uganda.

Kat -  So, what can web statistics from Wikipedia tell us about these diseases?

Sara -  People go to Wikipedia before they actually go to see the doctor.  Wikipedia releases hourly counts of how many people access each article on different languages.  So, if people go to their Wikipedia article for flu, we can see the article was accessed in English or in German or in Russian or in Portuguese.  And that's what we decided to use to monitor the prevalence of diseases and also, to forecast because you typically go to a Wikipedia before then so you will see them later on on regular public health data.

Kat -  How early can you predict that there might be an outbreak of this disease?

Sara -  It varies.  For flu, we were able to forecast up to 4 weeks in advance.  For dengue too, we were able to forecast 4 weeks in advance.  For tuberculosis, we could only forecast about a week in advance.

Kat -  With something like Ebola, I mean, I've looked at the Wikipedia page.  How do you know people aren't just curious about the disease rather than thinking, "Oh, maybe I've got this"?

Sara -  It's interesting that you mentioned that because it appears that Ebola is just an interesting disease that is accessed on a regular basis, given that few numbers of cases in Uganda, the number of people go into the Ebola page just completely messed up the model.  When there's something going on, or there's a lot of media hype, or attention to a specific disease, I think it would be very hard to pick up some of those patterns.

Kat -  So you can see - you think, it looks like there's an outbreak of this disease in this country.  What should you do with that information?

Sara -  Our goal is for public health officials to use this kind of information so that they can prepare and plan ahead of time if they know that people are going to be showing up with flu symptoms or with dengue symptoms that they can be prepared.  They have staff and resources to treat those people.

Kat -  How are you going to try and take this forward and make this a better tool?

Sara -  Right now, we are working with the Wikipedia Foundation.  I think once they move forward towards and stratify the data then we can do better analysis on country disease pairs.  We only looked at 14 different contexts and we would like to look at more diseases and more countries.  We were able to train our model using data from one country and then to forecast a different country.  I think that can completely revolutionise bio-surveillance for diseases, because there's many countries we know that they don't really collect a lot of data.  If we can show that we can use other countries that actually collect data and have very good public health departments to train their models and we can forecast diseases in developing countries, I think that would be a great contribution to the public health system.

Add a comment

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.