In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we invest...
This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence s...
This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 t...
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz Jo...
The paper describes a particular approach to multiengine machine translation (MEMT), where we make use of voted language models to selectively combine translation outputs from mul...
Performance of n-gram language models depends to a large extent on the amount of training text material available for building the models and the degree to which this text matches...