Sciweavers

LREC
2008

Using Reordering in Statistical Machine Translation based on Alignment Block Classification

13 years 5 months ago
Using Reordering in Statistical Machine Translation based on Alignment Block Classification
Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantatge of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotone translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test)...
Marta R. Costa-Jussà, José A. R. Fon
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Marta R. Costa-Jussà, José A. R. Fonollosa, Enric Monte
Comments (0)