Discriminative Sample Selection for Statistical Machine Translation

15 years 5 months ago

Download www.aclweb.org

Production of parallel training corpora for the development of statistical machine translation (SMT) systems for resource-poor languages usually requires extensive manual effort. Active sample selection aims to reduce the labor, time, and expense incurred in producing such resources, attaining a given performance benchmark with the smallest possible training corpus by choosing informative, nonredundant source sentences from an available candidate pool for manual translation. We present a novel, discriminative sample selection strategy that preferentially selects batches of candidate sentences with constructs that lead to erroneous translations on a held-out development set. The proposed strategy supports a built-in diversity mechanism that reduces redundancy in the selected batches. Simulation experiments on English-to-Pashto and Spanish-to-English translation tasks demonstrate the superiority of the proposed approach to a number of competing techniques, such as random selection, diss...

Sankaranarayanan Ananthakrishnan, Rohit Prasad, Da

Real-time Traffic

Active Sample Selection | Discriminative Sample Selection | EMNLP 2010 | Natural Language Processing | Sample Selection |

claim paper

» Structural Feature Selection For EnglishKorean Statistical Machine Translation

» A Discriminative Latent Variable Model for Statistical Machine Translation

» Extending Statistical Machine Translation with Discriminative and TriggerBased Lexicon Mod...

» Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation

» Discriminative FeatureTied Mixture Modeling for Statistical Machine Translation

» A Discriminative Syntactic Word Order Model for Machine Translation

» SemiSupervised Training for Statistical Word Alignment

» Discriminative Reranking for Machine Translation

Post Info
More Details (n/a)

Added	11 Feb 2011
Updated	11 Feb 2011
Type	Journal
Year	2010
Where	EMNLP
Authors	Sankaranarayanan Ananthakrishnan, Rohit Prasad, David Stallard, Prem Natarajan

Comments (0)

Sciweavers

Discriminative Sample Selection for Statistical Machine Translation

Active Sample Selection | Discriminative Sample Selection | EMNLP 2010 | Natural Language Processing | Sample Selection |

Explore & Download

Productivity Tools

Sciweavers