Document-centric XML collections contain text-rich documents, marked up with XML tags. The tags add lightweight semantics to the text. Querying such collections calls for a hybrid...
In a print production system, the ability to match a printed document with its original electronic form enables services that improve robustness of the production process, such as...
Preprocessing, a major component of Character Recognition System, has direct effect on the recognition system by its performance. A preprocessing method for NaXi Pictograph Charac...
Almost all document analysis approaches need to perform a global analysis of the page orientation as a separate process at an early stage. It would be preferable to estimate the o...
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...