CAPTCHA-ing old texts

14 September 2008


Anyone who's a regular web user will be familiar with CAPTCHA's - the little box of oddly shaped letters that you have to type out in order to access certain web pages.  CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart, and it's a highly effective security measure that means a computer system can tell if you're a real human, rather than a spambot.

A Captcha imageNow researchers from Carnegie Mellon University in Pittsburgh have harnessed this technology for a rather unusual end - transcribing old texts from printed material into digital form. Using the CAPTCHA system, the researchers have been asking computer users to decipher scanned words from books that can't be recognised by current character recognition computer programmes.  The team found that this method had an accuracy level of more than 99% - as good as professional human text transcribers. Currently, the system is being used in more than 40,000 websites, and has been used to transcribe over 440 million words.


Add a comment