Sciweavers

ICDAR
2005
IEEE

Language Identification of Character Images Using Machine Learning Techniques

13 years 9 months ago
Language Identification of Character Images Using Machine Learning Techniques
In this paper, we propose a new approach for identifying the language type of character images. We do this by classifying individual character images to determine the language boundaries in multilingual documents. Two effective methods are considered for this purpose: the prototype classification method and support vector machines (SVM). Due to the large size of our training dataset, we further propose a technique to speed up the training process for both methods. Applying the two methods to classifying characters into Chinese, English, and Japanese (including Hiragana and Katakana) has produced very accurate and comparable test results. Keywords language identification, prototype classification method, support vector machines (SVM)
Ying-Ho Liu, Fu Chang, Chin-Chin Lin
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where ICDAR
Authors Ying-Ho Liu, Fu Chang, Chin-Chin Lin
Comments (0)