We present in this paper a comparison between three segmentation systems for the Vietnamese language. Indeed, the majority of Vietnamese words is built by semantic composition fro...
Quang Thang Dinh, Hong Phuong Le, Thi Minh Huyen N...
There are two main topics in this paper: (i) Vietnamese words are recognized and sentences are segmented into words by using probabilistic models; (ii) the optimum probabilistic mo...
Word segmentation is the first and obligatory task for every NLP. For inflectional languages like English, French, Dutch,.. their word boundaries are simply assumed to be whitespa...
We present a theoretical and empirical comparative analysis of the two dominant categories of approaches in Chinese word segmentation: word-based models and character-based models...
Thispaper presents a text word extraction algorithm that takes a set of bounding boxes of glyphs and their associated text lines of a given document andpartitions the glyphs into ...