Inducing Sentence Structure from Parallel Corpora for Reordering

12 years 6 months ago
Inducing Sentence Structure from Parallel Corpora for Reordering
When translating among languages that differ substantially in word order, machine translation (MT) systems benefit from syntactic preordering—an approach that uses features from a syntactic parse to permute source words into a target-language-like order. This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of using a supervised parser trained on a treebank. These induced parses are used to preorder source sentences. We demonstrate that our induced parser is effective: it not only improves a state-of-the-art phrase-based system with integrated reordering, but also approaches the performance of a recent preordering method based on a supervised parser. These results show that the syntactic structure which is relevant to MT pre-ordering can be learned automatically from parallel text, thus establishing a new application for unsupervised grammar induction.
John DeNero, Jakob Uszkoreit
Added 20 Dec 2011
Updated 20 Dec 2011
Type Journal
Year 2011
Authors John DeNero, Jakob Uszkoreit
Comments (0)