Forest-based Translation Rule Extraction

11 years 2 months ago
Forest-based Translation Rule Extraction
Translation rule extraction is a fundamental problem in machine translation, especially for linguistically syntax-based systems that need parse trees from either or both sides of the bitext. The current dominant practice only uses 1-best trees, which adversely affects the rule set quality due to parsing errors. So we propose a novel approach which extracts rules from a packed forest that compactly encodes exponentially many parses. Experiments show that this method improves translation quality by over 1 BLEU point on a state-of-the-art tree-to-string system, and is 0.5 points better than (and twice as fast as) extracting on 30best parses. When combined with our previous work on forest-based decoding, it achieves a 2.5 BLEU points improvement over the baseline, and even outperforms the hierarchical system of Hiero by 0.7 points.
Haitao Mi, Liang Huang
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Authors Haitao Mi, Liang Huang
Comments (0)