Sciweavers

ACL
2010

Intelligent Selection of Language Model Training Data

13 years 1 months ago
Intelligent Selection of Language Model Training Data
We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.
Robert C. Moore, William Lewis
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where ACL
Authors Robert C. Moore, William Lewis
Comments (0)