This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...
Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
In this paper, based on the study of the specificity of historical printed books, we first explain the main error sources in classical methods used for page layout analysis. We sho...
Jean-Yves Ramel, S. Leriche, M. L. Demonet, S. Bus...
Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automat...
Iuliu Vasile Konya, Christoph Seibert, Sebastian G...
This paper describes the handwriting recognition competition held at ICDAR 2009. This competition is based on the RIMES-database, with French written text documents. These documen...