This paper presents a new document image binarization technique that segments the text from badly degraded historical document images. The proposed technique makes use of the imag...
In this paper, a high-speed document image classification algorithm is presented. The algorithm is based on the bottom-up strategy which can successfully segment and classify any ...
For user convenience, processing of document images captured by a digital camera has been attracted much attention. However, most existing processing methods require an upright im...
In recent years, statistical language models are being proposed as alternative to the vector space model. Viewing documents as language samples introduces the issue of defining a...
Digital libraries can take advantage of documents that have their content (semantics) explicitly represented as knowledge structures. These knowledge-rich documents can be created ...