Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

12

EMNLP
2009

favoriteEmaildiscussreport

96views Natural Language Processing» more EMNLP 2009»

Less is More: Significance-Based N-gram Selection for Smaller, Better Language Models

13 years 2 months ago

Less is More: Significance-Based N-gram Selection for Smaller, Better Language Models

Download www.aclweb.org

The recent availability of large corpora for training N-gram language models has shown the utility of models of higher order than just trigrams. In this paper, we investigate methods to control the increase in model size resulting from applying standard methods at higher orders. We introduce significance-based N-gram selection, which not only reduces model size, but also improves perplexity for several smoothing methods, including Katz backoff and absolute discounting. We also show that, when combined with a new smoothing method and a novel variant of weighted-difference pruning, our selection method performs better in the trade-off between model size and perplexity than the best pruning method we found for modified Kneser-Ney smoothing.

Robert C. Moore, Chris Quirk

Real-time Traffic

EMNLP 2009 | Model Size | N-gram Language Models | Natural Language Processing | Smoothing Method |

claim paper

Related Content

» Lessons Learned in PartofSpeech Tagging of Conversational Speech

» Unsupervised Parse Selection for HPSG

» A Bayesian Model of SyntaxDirected Tree to String Grammar Induction

» Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System

» Formal online methods for voltagefrequency control in multiple clock domain microprocessor...

Post Info
More Details (n/a)

Added	17 Feb 2011
Updated	17 Feb 2011
Type	Journal
Year	2009
Where	EMNLP
Authors	Robert C. Moore, Chris Quirk

Comments (0)