Printing and scanning of text documents introduces degradations to the characters which can be modeled. Interestingly, certain combinations of the parameters that govern the degra...
A perturbation model for generating synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the per...
Abstract. The Robert Gordon University (RGU) participated in the Opinion Retrieval Task of the Trec 2007 Blog Track. At the core of the system we developed is a set of training doc...
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
This paper describes a new method for the classification of a HTML document into a hierarchy of categories. The hierarchy of categories is involved in all phases of automated docum...