Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

14

EMNLP
2010

favoriteEmaildiscussreport

167views Natural Language Processing» more EMNLP 2010»

Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval

13 years 2 months ago

Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval

Download www.aclweb.org

We present three novel methods of compactly storing very large n-gram language models. These methods use substantially less space than all known approaches and allow n-gram probabilities or counts to be retrieved in constant time, at speeds comparable to modern language modeling toolkits. Our basic approach generates an explicit minimal perfect hash function, that maps all n-grams in a model to distinct integers to enable storage of associated values. Extensions of this approach exploit distributional characteristics of n-gram data to reduce storage costs, including variable length coding of values and the use of tiered structures that partition the data for more efficient storage. We apply our approach to storing the full Google Web1T n-gram set and all 1-to-5 grams of the Gigaword newswire cor

David Guthrie, Mark Hepple

Real-time Traffic

EMNLP 2010 | Google Web1T N-gram | N-gram | N-gram Language Models | Natural Language Processing |

claim paper

Related Content

» BRAHMS A WorkBench RDF Store and High Performance Memory System for Semantic Association D...

» Time and Space Efficient Algorithms for TwoParty Authenticated Data Structures

» Dynamical Recognizers RealTime Language Recognition by Analog Computers

» Authenticated hash tables

» An adaptive bin framework search method for a betasheet protein homopolymer model

» Efficient GMLnative processors for webbased GIS techniques and tools

» A sparse gaussian processes classification framework for fast tag suggestions

Post Info
More Details (n/a)

Added	11 Feb 2011
Updated	11 Feb 2011
Type	Journal
Year	2010
Where	EMNLP
Authors	David Guthrie, Mark Hepple

Comments (0)