Large Language Models in Machine Translation

13 years 5 months ago

Download acl.ldc.upenn.edu

This paper reports on the beneﬁts of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabilities for fast, single-pass decoding. We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases.

Thorsten Brants, Ashok C. Popat, Peng Xu, Franz Jo

Real-time Traffic

EMNLP 2007 | Large Data Sets | Largescale Statistical Language | Natural Language Processing | Training Data Increases |

claim paper

» A Large Scale Distributed Syntactic Semantic and Lexical Language Model for Machine Transl...

» MultiEngine Machine Translation with Voted Language Model

» Resampling auxiliary data for language model adaptation in machine translation for speech

» Semisupervised model adaptation for statistical machine translation

» Sinuhe Statistical Machine Translation using a Globally Trained Conditional Exponential F...

» Unsupervised Discriminative Language Model Training for Machine Translation using Simulate...

» Phrasal Segmentation Models for Statistical Machine Translation

» Local lexical adaptation in Machine Translation through triangulation SMT helping SMT

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	EMNLP
Authors	Thorsten Brants, Ashok C. Popat, Peng Xu, Franz Josef Och, Jeffrey Dean

Comments (0)

Sciweavers

Large Language Models in Machine Translation

EMNLP 2007 | Large Data Sets | Largescale Statistical Language | Natural Language Processing | Training Data Increases |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers