Sciweavers

LREC
2008

Word Segmentation of Vietnamese Texts: a Comparison of Approaches

13 years 6 months ago
Word Segmentation of Vietnamese Texts: a Comparison of Approaches
We present in this paper a comparison between three segmentation systems for the Vietnamese language. Indeed, the majority of Vietnamese words is built by semantic composition from about 7,000 syllables, that also have a meaning as isolated words. So the identification of word boundaries in a text is not a simple task, and ambiguities often appear. Beyond the presentation of the tested systems, we also propose a standard definition for word segmentation in Vietnamese, and introduce a reference corpus developed for the purpose of evaluating such a task. The results observed confirm that it can be relatively well treated by automatic means, although a solution needs to be found to take into account out-of-vocabulary words.
Quang Thang Dinh, Hong Phuong Le, Thi Minh Huyen N
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Quang Thang Dinh, Hong Phuong Le, Thi Minh Huyen Nguyen, Cam-Tu Nguyen, Mathias Rossignol, Xuân Luong Vu
Comments (0)