Sign In

Communications of the ACM

ACM TechNews

Deciphering Old Texts, One Woozy, Curvy Word at a Time

View as: Print Mobile App Share:

Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHAS) are being used to transform old texts riddled with smudges, crooked type, and other distortions into searchable files. Optical character recognition (OCR) software often cannot correct mistakes in old texts, so human transcribers are enlisted.

Carnegie Mellon University researcher Luis von Ahn worked out a method to recruit CAPTCHA solvers on sites such as Ticketmaster and Facebook to correct such textual errors by replacing the randomly generated CAPTCHAS with words in need of clarification. Von Ahn estimates that reCaptcha is being employed by 70% to 90% of sites that have CAPTCHAS.

Two distinct OCR programs scan a photographic image of the text, and any word that is deciphered differently by the two programs or that does not appear in an English dictionary is labeled as suspicious by ReCaptcha. Each suspicious word is converted to aCAPTCHA, paired with a second CAPTCHA whose correct translation is already known, and then several Web users seeking entry to secure sites are provided both words and asked to decipher them separately. Answers for the unknown word are compared with the OCR guesses and the context analysis, and if the system is satisfied that the answer is correct, the game ends.

From The New York Times
View Full Article


Abstracts Copyright © 2011 Information Inc. External Link, Bethesda, Maryland, USA 


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account