Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

77

ICDAR
2007
IEEE

favoriteEmaildiscussreport

140views Document Analysis» more ICDAR 2007»

Iterated Document Content Classification

15 years 2 months ago

Iterated Document Content Classification

Download www.cse.lehigh.edu

We report an improved methodology for training classifiers for document image content extraction, that is, the location and segmentation of regions containing handwriting, machine-printed text, photographs, blank space, etc. Our previous methods classified each individual pixel separately (rather than regions): this avoids the arbitrariness and restrictiveness that result from constraining region shapes (to, e.g., rectangles). However, this policy also allows content classes to vary frequently within small regions, often yielding areas where several content classes are mixed together. This does not reflect the way that real content is organized: typically almost all small local regions are of uniform class. This observation suggested a post-classification methodology which enforces local uniformity without imposing a restricted class of region shapes. We choose features extracted from small local regions (e.g. 4-5 pixels radius) with which we train classifiers that operate on the outp...

Chang An, Henry S. Baird, Pingping Xiu

Real-time Traffic

Document Analysis | Document Images | ICDAR 2007 | Region Shapes | Small Local Regions |

claim paper

Related Content

» The Convergence of Iterated Classification

» Combining content and structure similarity for XML document classification using composite...

» Document zone content classification and its performance evaluation

» DL Meets P2P Distributed Document Retrieval Based on Classification and Content

» Iterative pre and postprocessing for MRC layers of scanned documents

» Exploring a new space of features for document classification figure clustering

» A Proposal for Annotation Semantic Similarity and Classification of Textual Documents

» A hierarchical generative model for Generic Audio Document Categorization

» Learning to Separate Text Content and Style for Classification

Post Info
More Details (n/a)

Added	16 Aug 2010
Updated	16 Aug 2010
Type	Conference
Year	2007
Where	ICDAR
Authors	Chang An, Henry S. Baird, Pingping Xiu

Comments (0)