This paper presents a trainable rule-based algorithm for performing word segmentation. The algorithm provides a simple, language-independent alternative to large-scale lexicai-bas...
We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entr...
In this paper, we present an unsupervized segmentation system tested on Mandarin Chinese. Following Harris's Hypothesis in Kempe (1999) and Tanaka-Ishii's (2005) reformu...
This paper describes a hybrid model that combines machine learning with linguistic heuristics for integrating unknown word identification with Chinese word segmentation. The model...
We present a study of new word identification (NWI) to improve the performance of a Chinese word segmenter. In this paper the distribution and types of new words are discussed emp...