* In this paper, we present a study of hierarchically characterizing image content from coarse level to fine level conducted using a series of shape features as a case in point. I...
This paper presents a generic architecture for handwriting documents analysis. It covers all analysis steps from the content description of the document (layout analysis, handwrit...
Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way,...
This paper reports a document retrieval technique that retrieves machine-printed Latin-based document images through word shape coding. Adopting the idea of image annotation, a wo...
Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures ...