Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...
This paper presents a new document image binarization technique that segments the text from badly degraded historical document images. The proposed technique makes use of the imag...
In this paper, a high-speed document image classification algorithm is presented. The algorithm is based on the bottom-up strategy which can successfully segment and classify any ...
In recent years, statistical language models are being proposed as alternative to the vector space model. Viewing documents as language samples introduces the issue of defining a...
Digital libraries can take advantage of documents that have their content (semantics) explicitly represented as knowledge structures. These knowledge-rich documents can be created ...