Structured documents, especially the XML documents, are made up of a few logical components, such as title, sections, subsections and paragraphs. The components in each structured...
Image anchor templates are used in document image analysis for document classification, data localization, and other tasks. Current tools allow human operators to mark out small s...
The identification and analysis of the knowledge available in document form is a key element of corporate knowledge management. In engineering-intensive organizations, it involves...
Abstract. In this paper, we present a method for the automatic extraction of numerical fields (zip codes, phone numbers, etc.) from incoming mail documents. The approach is based o...
Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging tract structure to detailed rendering and layout. We pres...