Sciweavers

Share
FLAIRS
2011

Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-

9 years 3 months ago
Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-
This paper considers a scenario when we are given almost perfect knowledge about bilingual terminology in terms of a test corpus in Statistical Machine Translation (SMT). When the given terminology is part of a training corpus, one natural strategy in SMT is to use the trained translation model ignoring the given terminology. Then, two questions arises here. 1) Can a word aligner capture the given terminology? This is since even if the terminology is in a training corpus, it is often the case that a resulted translation model may not include these terminology. 2) Are probabilities in a translation model correctly calculated? In order to answer these questions, we did experiment introducing a Multi-Word Expression-sensitive (MWEsensitive) word aligner and a hierarchical Pitman-Yor process-based translation model smoothing. Using 200k JP–EN NTCIR corpus, our experimental results show that if we introduce an MWE-sensitive word aligner and a new translation model smoothing, the
Tsuyoshi Okita, Andy Way
Added 28 Aug 2011
Updated 28 Aug 2011
Type Journal
Year 2011
Where FLAIRS
Authors Tsuyoshi Okita, Andy Way
Comments (0)
books