Distribution-Based Pruning of Backoff Language Models

15 years 5 months ago

Download research.microsoft.com

We propose a distribution-based pruning of n-gram backoff language models. Instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. Our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. Experimental results show that our method performed 7-9% (word perplexity reduction) better than conventional cutoff methods.

Jianfeng Gao, Kai-Fu Lee

Real-time Traffic

ACL 2000 | ACL 2007 | N-gram Backoff Language | N-gram Distribution | N-gram Occurs |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2000
Where	ACL
Authors	Jianfeng Gao, Kai-Fu Lee

Comments (0)

Sciweavers

Distribution-Based Pruning of Backoff Language Models

ACL 2000 | ACL 2007 | N-gram Backoff Language | N-gram Distribution | N-gram Occurs |

Explore & Download

Productivity Tools

Sciweavers