Minimum Risk Annealing for Training Log-Linear Models

10 years 5 months ago
Minimum Risk Annealing for Training Log-Linear Models
When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We define this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. Besides the linear loss functions used in previous work, we also describe techniques for optimizing nonlinear functions such as precision or the BLEU metric. We present experiments training log-linear combinations of models for dependency parsing and for machine translation. In machine translation, annealed minimum risk training achieves significant improvements in BLEU over standard minimum error training. We also show improvements in labeled...
David A. Smith, Jason Eisner
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACL
Authors David A. Smith, Jason Eisner
Comments (0)