Sciweavers

ICDAR
2009
IEEE

Classifying Foreground Pixels in Document Images

13 years 11 months ago
Classifying Foreground Pixels in Document Images
We present a system that classifies pixels in a document image according to marking type such as machine print, handwriting, and noise. A segmenter module first splits an input image into fragments, sometimes breaking connected components. Each fragment is then classified by an automatically trained multi-stage classifier that is fast and considers features of the fragment, as well as its neighborhood. Features relevant for discrimination are picked out automatically from among hundreds of measurements. Our system is trainable from example images in which each foreground pixel has a “ground-truth” label. The main distinction of our system is the level of accuracy achieved in classifying fragments at sub-connected component level, rather than larger aggregate groups such as words or text-lines. We have trained this system to detect handwriting, machine print text, machine print graphics, and noise.
Prateek Sarkar, Eric Saund, Jing Lin
Added 21 May 2010
Updated 21 May 2010
Type Conference
Year 2009
Where ICDAR
Authors Prateek Sarkar, Eric Saund, Jing Lin
Comments (0)