Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

14

EMNLP
2009

favoriteEmaildiscussreport

147views Natural Language Processing» more EMNLP 2009»

Discriminative Corpus Weight Estimation for Machine Translation

13 years 2 months ago

Discriminative Corpus Weight Estimation for Machine Translation

Download aclweb.org

Current statistical machine translation (SMT) systems are trained on sentencealigned and word-aligned parallel text collected from various sources. Translation model parameters are estimated from the word alignments, and the quality of the translations on a given test set depends on the parameter estimates. There are at least two factors affecting the parameter estimation: domain match and training data quality. This paper describes a novel approach for automatically detecting and down-weighing certain parts of the training corpus by assigning a weight to each sentence in the training bitext so as to optimize a discriminative objective function on a designated tuning set. This way, the proposed method can limit the negative effects of low quality training data, and can adapt the translation model to the domain of interest. It is shown that such discriminative corpus weights can provide significant improvements in Arabic-English translation on various conditions, using a state-of-the-a...

Spyros Matsoukas, Antti-Veikko I. Rosti, Bing Zhan

Real-time Traffic

EMNLP 2009 | Natural Language Processing | Parameter | Training | Translation Model |

claim paper

Related Content

» Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation

» Unsupervised Discriminative Language Model Training for Machine Translation using Simulate...

» Estimating Translation Probabilities from the Web for Structured Queries on CLIR

» SemiSupervised Training for Statistical Word Alignment

» Discriminative Sample Selection for Statistical Machine Translation

» SourceLanguage Features and Maximum Correlation Training for Machine Translation Evaluatio...

» Weighted Alignment Matrices for Statistical Machine Translation

» Using Comparable Corpora to Adapt a Translation Model to Domains

» A Comparative Evaluation of Datadriven Models in Translation Selection of Machine Translat...

Post Info
More Details (n/a)

Added	17 Feb 2011
Updated	17 Feb 2011
Type	Journal
Year	2009
Where	EMNLP
Authors	Spyros Matsoukas, Antti-Veikko I. Rosti, Bing Zhang

Comments (0)