scanned documents
Scanned documents now a part of Google’s index
Google has announced a new technology that allows its popular search engine to index scanned documents. It uses Optical Character Recognition (OCR) to convert a document saved as a PDF from an image to words. Previously the images of text were quite difficult to find via a search because the engine saw the document as an image and couldn’t recognize it properly.
“In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content,” Evin Levey, a Google product manager, said in a Google blog post. “We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes,” Levey added. “We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making the entire world’s information accessible and useful.”
Continued after the break.















