James Kaufmann, IBM
Every year, millions of us end up locked to lavatory seats for longer than weíd like owing to something we ate. But tracking down the sources of these food-borne illnesses is extremely difficult. It usually takes Public health officials months to carry out detailed interviews with victims, as well as conduct laboratory tests, to try to isolate the bugs responsible; and even then the culprit can sometimes escape identification.
Now, computer scientists at IBMís public health department in California, have discovered a way to use food sales information collected by supermarkets, together with reports of disease outbreaks, to pinpoint the sources of food-poisoning within days of the outbreak, potentially preventing thousands of people from becoming ill.
IBMís James Kaufmann is the project manager explained the project to Chris Smith...
James - The technique combines sales distribution for different products that might be available from a retailer like a large supermarket. With public health case reports, these are confirmed laboratory reports. So, you know that an outbreak is happening and you know where the case reports occurred, comparing those locations with the sales distribution for different products, itís possible to calculate the probability that each product in turn might be responsible for the outbreak or might be the contaminated product.
So, you calculate that product probability for every product and then you can use statistical techniques to determine the most likely suspect product set. And that set gets smaller as the number of case reports increases. Surprisingly, after as few as 10 case reports, that set size is small enough that you can actually test them all. We think that the paper is going to be interesting to, not just the public health community but also to the retailers; to the private companies that sell and distribute food because outbreaks lead to huge economic losses. So worldwide, the cost of food borne disease is about $9 billion a year in medical cost. But the economic losses due to lost sales of, in many cases, perfectly good food is over $75 billion a year. So, there's an economic incentive for the food retailers to take advantage of the data that they already have and they can proactively calculate sales distributions over time. Then when an outbreak does occur, they can use that to see if something in their inventory is involved in the outbreak.
Chris - Can you see that they might see this as a disadvantageous thing to do because if something does occur and itís on their patch, itís their fault potentially, this could have medico-legal and insurance implications. And so, wouldnít it better for them to remain under the radar under certain circumstances?
James - Itís actually quite the opposite. So, weíve talked to some of the big retailers in the US and they're very open. They have websites where they inform people about all of the recalls that are underway for different types of food, whether their products are involved. If you think about it, a supermarket is a victim. They're receiving food ingredients, grocery items from all over the world. If one of their suppliers provides them with food thatís contaminated, not only do they have to dispose of that food of course. If there's an outbreak, they may have to dispose of all salad products. In Europe in 2011, European farmers, the losses was over 150 million euros because they had to discard a wide variety of salad products. There was an outbreak but they didnít know the cause. They just knew that it was salad.
Chris - Why do you think that no one has done what youíve managed to do at IBM before now because itís not rocket science, is it to tie geography with sales volumes and pin that on a disease?
James - Itís not rocket science. It was however, surprising. The reason the method works is that there are differences in the pattern of sales for different products. One might intuitively think that the grocery sales will be so uniform. How could you possibly tell one food from another? But in fact, for the majority of products, there are significant differences. And so, thatís the essence of how the method works, so that was what we studied in the paper.