We tackle the previously unaddressed problem of unsupervised determination of the optimal morphological segmentation for statistical machine translation (SMT) and propose a segmen...
Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morphosyntactic information is an effective solution t...
Abstract. For many applications such as machine translation and bilingual information retrieval, the bilingual corpora play an important role in training the system. Because they a...
Phrasal segmentation models define a mapping from the words of a sentence to sequences of translatable phrases. We discuss the estimation of these models from large quantities of ...
HMM-based models are developed for the alignment of words and phrases in bitext. The models are formulated so that alignment and parameter estimation can be performed efficiently....