This paper proposes a chunking strategy to detect unknown words in Chinese word segmentation. First, a raw sentence is pre-segmented into a sequence of word atoms 1 using a maximum...
Data-driven learning based on shift reduce parsing algorithms has emerged dependency parsing and shown excellent performance to many Treebanks. In this paper, we investigate the e...
Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply ...
Jia Xu, Jianfeng Gao, Kristina Toutanova, Hermann ...
Adaptor grammars are a framework for expressing and performing inference over a variety of non-parametric linguistic models. These models currently provide state-of-the-art perfor...
This paper presents a method of using mutual information to improve the recognition algorithm of unknown Chinese words, it can resolve the complexity of weight settings and the in...