In 2002, on a Friday, Larry Page began to end the book as we know it. Using the 20 percent of his time that Google then allotted to its engineers for personal projects, Page and Vice-President Marissa Mayer developed a machine for turning books into data. The original was a crude plywood affair with simple clamps, a metronome, a scanner, and a blade for cutting the books into sheets. The process took 40 minutes. The first refinement Page developed was a means of digitizing books without cutting off their spines — a gesture of tender-hearted sentimentality towards print. The great disbinding was to be metaphorical rather than literal. A team of Page-supervised engineers developed an infrared camera that took into account the curvature of pages around the spine. They resurrected a long dormant piece of Optical Character Recognition software from Hewlett-Packard and released it to the open-source community for improvements. They then crowd-sourced textual correction at a minimal cost through a brilliant program called reCAPTCHA, which employs an anti-bot service to get users to read and type in words the Optical Character Recognition software can’t recognize. (A miracle of cleverness: everyone who has entered a security identification has also, without knowing it, aided the perfection of the world’s texts.) Soon after, the world’s five largest libraries signed on as partners. And, more or less just like that, literature became data.
Source: Los Angeles Review of Books.