Sciweavers

NLE
2007

Segmentation and alignment of parallel text for statistical machine translation

13 years 3 months ago
Segmentation and alignment of parallel text for statistical machine translation
We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a stochastic generative process over text translation pairs, and derive two different alignment procedures based on the underlying alignment model. The first procedure is a now-standard dynamic programming alignment model which we use to generate an initial coarse alignment of the parallel text. The second procedure is a divisive clustering parallel text alignment procedure which we use to refine the first-pass alignments. This latter procedure is novel in that it permits the segmentation of the parallel text into sub-sentence units which are allowed to be reordered to improve the chunk alignment. The quality of chunk pairs are measured by the performance of machine translation systems trained from them. We show practical benefits of divisive clustering as well as how system performance can be improved by e...
Yonggang Deng, Shankar Kumar, William Byrne
Added 27 Dec 2010
Updated 27 Dec 2010
Type Journal
Year 2007
Where NLE
Authors Yonggang Deng, Shankar Kumar, William Byrne
Comments (0)