Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

145

ICPR
2000
IEEE

190views computer vision» more ICPR 2000»

Statistical-Based Approach to Word Segmentation

15 years 10 months ago

Statistical-Based Approach to Word Segmentation

Download www.math.ucla.edu

Thispaper presents a text word extraction algorithm that takes a set of bounding boxes of glyphs and their associated text lines of a given document andpartitions the glyphs into a set of text words, using only the geometric information of the input glyphs. The algorithm isprobability based. An iterative, relaxation-like method is used tofind the partitioning solution that maximizes thejoint probability. To evaluate the petformance of our text word extraction algorithm, we used a 3-fold validation method and developed a quantitative performance measure. The algorithm was evaluated on the UW-KII database of some 1600 scanned document image pages. An area-overlap measure was used to find the correspondence between the detected entities and the ground-truth. For a total of 827,433 ground truth words, the algorithm identified and segmented 806,149words correctly, an accuracy of 97.43%.

Yalin Wang, Robert M. Haralick, Ihsin T. Phillips

Real-time Traffic

Algorithm Isprobability | ICPR 2000 | Text Word Extraction | Word Extraction Algorithm |

claim paper

Related Content

» Image Statistics Based on Diffeomorphic Matching

» A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers

» A Hybrid Approach to Word Segmentation and POS Tagging

» An ErrorDriven WordCharacter Hybrid Model for Joint Chinese Word Segmentation and POS Tagg...

» A Chunking Strategy Towards Unknown Word Detection in Chinese Word Segmentation

» Adapting Chinese Word Segmentation for Machine Translation Based on Short Units

» Wordbased and Characterbased Word Segmentation Models Comparison and Combination

» Is Arabic Part of Speech Tagging Feasible Without Word Segmentation

» Word Segmentation of Vietnamese Texts a Comparison of Approaches

Post Info
More Details (n/a)

Added	31 Jul 2010
Updated	31 Jul 2010
Type	Conference
Year	2000
Where	ICPR
Authors	Yalin Wang, Robert M. Haralick, Ihsin T. Phillips

Comments (0)