N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation

13 years 6 months ago

Download people.csail.mit.edu

In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the ngrams from such corpora may not be of equal relevance to the target domain, we propose an n-gram weighting technique to adjust the component n-gram probabilities based on features derived from readily available segmentation and metadata information for each corpus. Using a log-linear combination of such features, the resulting model achieves up to a

Bo-June Paul Hsu, James R. Glass

Real-time Traffic

Component N-gram Probabilities | EMNLP 2008 | Insufficient Matched Training | N-gram Weighting Technique | Natural Language Processing |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	EMNLP
Authors	Bo-June Paul Hsu, James R. Glass

Comments (0)

Sciweavers

N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation

Component N-gram Probabilities | EMNLP 2008 | Insufficient Matched Training | N-gram Weighting Technique | Natural Language Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers