Google is serious about scanning books. Throughout the objections raised over the years by authors and publishers and the more recent delays in its settlement with the Authors Guild, Google has been scanning millions of books all along trying to digitize as many as it possibly can. It is so serious about capturing and indexing the knowledge stored in books that it has a patent, which was issued on March 24, 2009, on how to scan books faster than was previously possible.
The basic technique it uses involves two infrared cameras which determine how flat or curved each page to be scanned is and then adjusting the optical character recognition software it uses to read the text accordingly. In other words, the infrared cameras help figure out a book’s three-dimensional shape and then back out any resulting distortions. This results in much faster book scanning since each page doesn’t need to be flattened by glass plates and spines don’t need to be broken.
There are other book scanning projects besides the Google Book Project. The Internet Archive, for instance, runs 18 scanning centers around the world, which all together digitize only 1,000 books a day. I am not sure what kind fo technology the Internet Archive uses, but I wouldn’t be surprised if Google’s scanning operation is much faster. Those are billions of pages of high-quality information just waiting to be indexed and searched. For Google, the faster it can get those books scanned, the faster it can start to serve ads against those searches. Now, I wonder how it flips the pages.