Digitizing ancient books, especially those related to the humanities, is practiced in many countries. The number of full-text databases in the humanities is increasing. Studies hav...
This paper describes how to make use of e-books that look like printed books in a knowledge network. After an overview of digitalization efforts and current digital library initia...
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Perspective distortion always occurs while scanning thick, bound documents, resulting in two problems in the scanned grayscale image ? (i) shade along the `spine' of the book...
Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of predefined entity classes (e.g., peop...