Abstract. Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching on...
Gopal Datt Joshi, Saurabh Garg, Jayanthi Sivaswamy
As XML has emerged as a data representation format and as great quantities of data have been stored in the XML format, XML document design has become an important and evident issu...
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
—A vast number of historical and badly degraded document images can be found in libraries, public, and national archives. Due to the complex nature of different artifacts, such p...
The representation of information collections needs to be optimized for human cognition. While documents often include rich visual components, collections, including personal coll...