Minimum Error Rate Training by Sampling the Translation Lattice

11 years 11 months ago
Minimum Error Rate Training by Sampling the Translation Lattice
Minimum Error Rate Training is the algorithm for log-linear model parameter training most used in state-of-the-art Statistical Machine Translation systems. In its original formulation, the algorithm uses N-best lists output by the decoder to grow the Translation Pool that shapes the surface on which the actual optimization is performed. Recent work has been done to extend the algorithm to use the entire translation lattice built by the decoder, instead of N-best lists. We propose here a third, intermediate way, consisting in growing the translation pool using samples randomly drawn from the translation lattice. We empirically measure a systematic improvement in the BLEU scores compared to training using N-best lists, without suffering the increase in computational complexity associated with operating with the whole lattice.
Samidh Chatterjee, Nicola Cancedda
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Authors Samidh Chatterjee, Nicola Cancedda
Comments (0)