Scientists have developed a computer programme to trawl through online publications and turn charts and graphs back into the raw data from which they were drawn. The work, carried out by Penn. State University researcher William Brouwer and his colleagues and published online in arxiv.org's preprint server (arxiv.org/0809.1802), describes a system that can trawl digital documents looking for the hallmarks of a graph and then by analysing the image re-create the raw figures.
The idea is that this system will render the data much more powerful because although graphs are a clear and efficient way to display data, they make comparing one person's results with those of another much more difficult.
Now, the team hope, their self-learning system will enable users to retabulate the information from graphs and plots so that it can be checked, reanalysed and used in other studies.
And apart from academics there are likely to be spin offs for sports enthusiasts and fraud-busters too, the team suggest. A sports fan, for instance, could use the system to extract raw data presented graphically on a webpage about a player or team's recent form, whilst stats from someone suspected of plagiarism could be compared to look for hallmark irregularities or similarities.