These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditi...
Abstract. This paper presents an architecture that enables the recognizer to learn incrementally and, thereby adapt to document image collections for performance improvement. We ar...
Text that appears in images contains important and useful information. Detection and extraction of text in images have been used in many applications. In this paper, we propose a ...
Codebook-based representations are widely employed in the classification of complex objects such as images and documents. Most previous codebook-based methods construct a single c...
Wei Zhang, Akshat Surve, Xiaoli Fern, Thomas G. Di...
This paper presents a text/graphic labelling for ancient printed documents. Our approach is based on the extraction and the quantification of the various orientations that are pre...