Bilingual Cluster Based Models for Statistical Machine Translation

10 years 1 months ago
Bilingual Cluster Based Models for Statistical Machine Translation
We propose a domain specific model for statistical machine translation. It is wellknown that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using i...
Hirofumi Yamamoto, Eiichiro Sumita
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2008
Authors Hirofumi Yamamoto, Eiichiro Sumita
Comments (0)