Science News

CAPTCHA-ing old texts

Sun, 14th Sep 2008

Listen Now    Download as mp3 from the show Why do we Stop Noticing Smells?

Anyone who’s a regular web user will be familiar with CAPTCHA’s – the little box of oddly shaped letters that you have to type out in order to access certain web pages.  CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart, and it’s a highly effective security measure that means a computer system can tell if you’re a real human, rather than a spambot.

A Captcha imageNow researchers from Carnegie Mellon University in Pittsburgh have harnessed this technology for a rather unusual end – transcribing old texts from printed material into digital form. Using the CAPTCHA system, the researchers have been asking computer users to decipher scanned words from books that can’t be recognised by current character recognition computer programmes.  The team found that this method had an accuracy level of more than 99% - as good as professional human text transcribers. Currently, the system is being used in more than 40,000 websites, and has been used to transcribe over 440 million words.

References

Multimedia

Subscribe Free

Related Content

Not working please enable javascript
EPSRC
Powered by UKfast
STFC
Genetics Society
ipDTL