Page segmentation algorithms found in published literatures often rely on some predetermined parameters such as general font sizes, distances between text lines and document scan ...
Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Index...
Optimisation of real world Variable Data printing (VDP) documents is a difficult problem because the interdependencies between layout functions may drastically reduce the number o...
Alexander J. Macdonald, David F. Brailsford, Steve...
In this paper, a new efficient word spotting methodology is presented that can be applied to historical printed documents without requiring any previous block or word segmentation...
In this article, we introduce a new problem: the construction of multi-structured documents. We first offer an overview of existing solutions to the representation of such docum...