Sciweavers

ACL
2012

Enhancing Statistical Machine Translation with Character Alignment

11 years 7 months ago
Enhancing Statistical Machine Translation with Character Alignment
The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two different segmentation specifications for alignment and translation respectively: we use Chinese character as the basic unit for alignment, and then convert this alignment to conventional word alignment for translation rule induction. Experimentally, our approach outperformed two baselines: fully word-based system (using word for both alignment and translation) and fully character-based system, in terms of alignment quality and translation performance.
Ning Xi, Guangchao Tang, Xinyu Dai, Shujian Huang,
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where ACL
Authors Ning Xi, Guangchao Tang, Xinyu Dai, Shujian Huang, Jiajun Chen
Comments (0)