Sciweavers

ACL
2008

Mining Parenthetical Translations from the Web by Word Alignment

13 years 6 months ago
Mining Parenthetical Translations from the Web by Word Alignment
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extract such translations from a large collection of web documents by building a partially parallel corpus and use a word alignment algorithm to identify the terms being translated. The method is able to generalize across the translations for different terms and can reliably extract translations that occurred only once in the entire web. Our experiment on Chinese web pages produced more than 26 million pairs of translations, which is over two orders of magnitude more than previous results. We show that the addition of the extracted translation pairs as training data provides significant increase in the BLEU score for a statistical machine translation system.
Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ACL
Authors Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Marius Pasca
Comments (0)