Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

4

ACL
2008

favoriteEmaildiscussreport

101views Computational Linguistics» more ACL 2008»

Smoothing a Tera-word Language Model

13 years 6 months ago

Smoothing a Tera-word Language Model

Download aclweb.org

Frequency counts from very large corpora, such as the Web 1T dataset, have recently become available for language modeling. Omission of low frequency n-gram counts is a practical necessity for datasets of this size. Naive implementations of standard smoothing methods do not realize the full potential of such large datasets with missing counts. In this paper I present a new smoothing algorithm that combines the Dirichlet prior form of (Mackay and Peto, 1995) with the modified back-off estimates of (Kneser and Ney, 1995) that leads to a 31% perplexity reduction on the Brown corpus compared to a baseline implementation of Kneser-Ney discounting.

Deniz Yuret

Real-time Traffic

ACL 2008 | Computational Linguistics | Frequency N-gram Counts | Standard Smoothing Methods | Web 1T Dataset |

claim paper

Related Content

» A general optimization framework for smoothing language models on graph structures

» Improved Smoothing for Ngram Language Models Based on Ordinary Counts

» An analysis on document length retrieval trends in language modeling smoothing

» Lightening the load of document smoothing for better language modeling retrieval

» Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrie...

» Emoticon Smoothed Language Models for Twitter Sentiment Analysis

» Smoothing Techniques for TreekGrammarBased Natural Language Modeling

» Termspecific smoothing for the language modeling approach to information retrieval the imp...

» Language Models and Smoothing Methods for Collections with Large Variation in Document Len...

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	ACL
Authors	Deniz Yuret

Comments (0)