Sciweavers

ICDAR
2011
IEEE

A Handwritten Character Extraction Algorithm for Multi-language Document Image

12 years 4 months ago
A Handwritten Character Extraction Algorithm for Multi-language Document Image
—In this paper, we propose a novel method for extracting handwritten characters from multi-language document images, which may contain various types of characters, e.g. Chinese, English, Japanese or their mixture. Firstly, text patches in document image are segmented based on connected component analysis. Rules for merging connected components are chosen according to the results of language identification. Then features are extracted for each basic analysis unit-text patch. Genetic algorithm is applied for feature fusion and patch type classification. Finally, a Markov Random Field model is utilized as a post-processing step to further correct the misclassification of text patch type by considering the document context. Experimental results show that the proposed algorithm can apparently improve the performance of handwritten character extraction. Keywords-handwritten character extraction; multi-language; document segmentation; feature fusion; Markov random field
Yonghong Song, Guilin Xiao, Yuanlin Zhang, Lei Yan
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Where ICDAR
Authors Yonghong Song, Guilin Xiao, Yuanlin Zhang, Lei Yang, Liuliu Zhao
Comments (0)