You know that irritatingly fuzzy words that you have to enter before you get those oh so desired NCAA tickets? You know, those “Captcha” words?
What if I told you that you are participating in an actual job–that you were, in fact, working for Google? Yup. In yet another brilliant crowdsourcing scheme, similar to their Goog-411 project (remember calling 1-800-GOOG-411 for free information? they were using your voice to teach their search engine how to perform search by voice), Google has us, the unsuspecting common man, to decipher inscrutible text in the Google Books project.
According to a New York Time’s article, “Deciphering Old Texts, One Woozy, Curvy Word at a Time,” Captchas was a project developed by a computer scientist Luis von Ahn and his team at Carnegie Mellon University.
The now ubitquituous reCaptchas program was created after the New York Times asked von Ahn’s group to digitize their archives. Now, any of you who ever scanned a document know, it is an imperfect science. When I worked in records early in my legal career, I learned the hard way how difficult it is to OCR docs. There are always drunken looking words that hang in the inner margins, stamps that are difficult to read, old typewriter ribbons that smudged.
So von Ahn (don’t you like that name? It think it rhymes with “Ah-Hah!”), converted the unreadable to a bitmap image, convert it to OCR (optical character recognition), then converted it to a Captcha (completely automated public Turing test to tell computers and humans apart (Turing being the British computer pioneer)). Then it is a simple matter for a human to look at the text and decipher.
The only catch is that there are gazillions of these words stretched across time. So von Ahn figured a way to do it. An extremely profitable way. ReCaptcha is the go-to authentication service for web sites.
Von Ahn figures that 200 million Captchas are figured out each day, at a rate of 1 Captcha decoded every 10 seconds. He started his little project in 2006. He sold the little start-up in 2009. To Google. For an undisclosed amount.
Maybe our own smarty-pants in legal land can figure out how to Captcha our own OCRs by our unwitting staff. Hmmm. The mind boggles.
All I want to know is can I get Google to give me back pay for all of this?