We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: the segmentation output, described a...
A new technique to locate content-representing words for a given document image using representation of character shapes is described. A character shape code representation define...
Ontology is playing an increasingly important role in knowledge management and the Semantic Web. This study presents a novel episode-based ontology construction mechanism to extra...
In the traditional setting, text categorization is formulated as a concept learning problem where each instance is a single isolated document. However, this perspective is not appr...
A degradation model that describes many image degradations produced by desktop scanning is used to study the edge noise that is present in bilevel document images. The standard de...
Craig McGillivary, Chris Hale, Elisa H. Barney Smi...