Wikipedia predicts disease arrival

Disease outbreaks can be predicted 28 days in advance, using the viewing figures of Wikipedia pages, new research has shown...
14 November 2014


Disease outbreaks can be predicted around 28 days in advance using pageview data from Wikipedia, a new study has shown.

Scientists looked at influenza outbreaks in the United States, Poland, Wiki ForecastJapan and Thailand, as well as dengue fever in Brazil and Thailand, and tuberculosis in China and Thailand.

"People go to Wikipedia before they actually go to see the doctor", said Dr Sara Del Valle, who lead the team of researchers from Los Alamos National Laboratory. "If people go to the Wikipedia article for 'flu, we can see whether that article was accessed in English, or in German, or in Portuguese, and that's what we decided to monitor."

In many cases the team were able to predict the outbreak of diseases several weeks in advance. They hope that the research can lead to a disease monitoring system that will be used in a similar way to weather forecasts, but instead of predicting the weather, predicting disease outbreaks.   

"Our goal is for public health officials to use this kind of information so that they can prepare and plan ahead of time," continued Del Valle. "If they know that people are going to be showing up with flu symptoms or dengue symptoms, then they can be prepared to have staff and resources available to treat those people." 

Del Valle and her colleagues encountered one problem with the method when trying to monitor Ebola. There is so much focus on Ebola at the moment in the media, that there are far more people visiting the disease's Wikipedia page out of interest rather than actually having it. This made it very difficult for the team to accurately track the progress of Ebola using this system.

The researcher's method, as published in PLOS Computational Biology, relied on "teaching" a computer algorithm to recognise the signs of a disease outbreak, before the outbreak had occurred. Interestingly, the team were able to train the algorithm using data from one country and then successfully use it in another.   

Many developing countries collect very little public health data, which makes disease outbreaks difficult to predict. However, the fact that the team's method can be moved between countries, means that the research can also be used in developing countries. 

As Del Valle said: "this can completely revolutionise how live surveillance is performed for diseases."


Add a comment