This paper presents a new document image binarization technique that segments the text from badly degraded historical document images. The proposed technique makes use of the imag...
Document image matching is the key technique for document registration and retrieval. In this paper, a new matching algorithm based on document component block list and component ...
The way memory hierarchy has evolved in recent decades has opened new challenges in the development of indexing structures in general and spatial access methods in particular. In t...
Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea ident...
Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl...
Named Entity Recognition (NER) is an important subtask of document processing such as Information Extraction. This paper describes a NER algorithm which uses a Multi-Layer Percept...