Science News

Wikipedia predicts disease arrival

Fri, 14th Nov 2014

Timothy Revell

Disease outbreaks can be predicted around 28 days in advance using pageview data from Wikipedia, a new study has shown.

Scientists looked at influenza outbreaks in the United States, Poland, Wiki ForecastJapan and Thailand, as well as dengue fever in Brazil and Thailand, and tuberculosis in China and Thailand.

“People go to Wikipedia before they actually go to see the doctor”, said Dr Sara Del Valle, who lead the team of researchers from Los Alamos National Laboratory. “If people go to the Wikipedia article for 'flu, we can see whether that article was accessed in English, or in German, or in Portuguese, and that’s what we decided to monitor.”

In many cases the team were able to predict the outbreak of diseases several weeks in advance. They hope that the research can lead to a disease monitoring system that will be used in a similar way to weather forecasts, but instead of predicting the weather, predicting disease outbreaks.   

“Our goal is for public health officials to use this kind of information so that they can prepare and plan ahead of time,” continued Del Valle. “If they know that people are going to be showing up with flu symptoms or dengue symptoms, then they can be prepared to have staff and resources available to treat those people.” 

Del Valle and her colleagues encountered one problem with the method when trying to monitor Ebola. There is so much focus on Ebola at the moment in the media, that there are far more people visiting the disease’s Wikipedia page out of interest rather than actually having it. This made it very difficult for the team to accurately track the progress of Ebola using this system.

The researcher’s method, as published in PLOS Computational Biology, relied on "teaching" a computer algorithm to recognise the signs of a disease outbreak, before the outbreak had occurred. Interestingly, the team were able to train the algorithm using data from one country and then successfully use it in another.   

Many developing countries collect very little public health data, which makes disease outbreaks difficult to predict. However, the fact that the team’s method can be moved between countries, means that the research can also be used in developing countries. 

As Del Valle said: “this can completely revolutionise how live surveillance is performed for diseases.”

References

Subscribe Free

Related Content

Comments

Make a comment

Most interesting.

I could see advantages if one could program Wikipedia, Google, or WebMD to snag the incoming IP addresses to increase location specificity, but could imagine serious privacy concerns.

News, of course, might confound the program.  It is not uncommon for me to hear something on NPR then look up additional details on the web.  One might be able to control to some extent with knowing the timing of audio/visual news programs, but it would be harder to control with printed news, or internet news.  And, sometimes if I'm driving, I try to look up the info later.

News, however, isn't entirely a bad thing.  If I hear about the swine flu locally, I might be inclined to look up info about it whether or not I am sick, or personally know someone that is sick.  And thus, one still gets the local hits.

Perhaps the specificity of the searches could be used to control "news" vs "symptoms".  So, if I hear about Ebola, then I'll look up Ebola.  On the other hand, if I had symptoms such as yellow eyes, then I might look up the symptoms hoping to bounce around until I arrive at a tentative diagnosis. CliffordK, Sat, 22nd Nov 2014

See the whole discussion | Make a comment

Not working please enable javascript
EPSRC
Powered by UKfast
STFC
Genetics Society
ipDTL