Almost all Chinese language processing tasks involve word segmentation of the language input as their first steps, thus robust and reliable segmentation techniques are always requ...
This paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) t...
Traditional word alignment approaches cannot come up with satisfactory results for Named Entities. In this paper, we propose a novel approach using a maximum entropy model for nam...
For Chinese POS tagging, word segmentation is a preliminary step. To avoid error propagation and improve segmentation by utilizing POS information, segmentation and tagging can be...
We investigate the impact of input data scale in corpus-based learning using a study style of Zipf's law. In our research, Chinese word segmentation is chosen as the study ca...