Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

28

ACL
2007

favoriteEmaildiscussreport

130views Computational Linguistics» more ACL 2007»

Randomised Language Modelling for Statistical Machine Translation

13 years 10 months ago

Randomised Language Modelling for Statistical Machine Translation

Download aclweb.org

A Bloom ﬁlter (BF) is a randomised data structure for set membership queries. Its space requirements are signiﬁcantly below lossless information-theoretic lower bounds but it produces false positives with some quantiﬁable probability. Here we explore the use of BFs for language modelling in statistical machine translation. We show how a BF containing n-grams can enable us to use much larger corpora and higher-order models complementing a conventional n-gram LM within an SMT system. We also consider (i) how to include approximate frequency information efﬁciently within a BF and (ii) how to reduce the error rate of these models by ﬁrst checking for lower-order sub-sequences in candidate ngrams. Our solutions in both cases retain the one-sided error guarantees of the BF while taking advantage of the Zipf-like distribution of word frequencies to reduce the space requirements.

David Talbot, Miles Osborne

Real-time Traffic

ACL 2007 | BF Containing N-grams | Computational Linguistics | Information-theoretic Lower Bounds | Space Requirements |

claim paper

Related Content

» Statistical Machine Translation with Local Language Models

» A novel dependencytostring model for statistical machine translation

» Enhancing Language Models in Statistical Machine Translation with Backward Ngrams and Mutu...

» Fast and Scalable Decoding with Language Model LookAhead for Phrasebased Statistical Machi...

» Online Language Model Biasing for Statistical Machine Translation

» Combining WordLevel and CharacterLevel Models for Machine Translation Between CloselyRelat...

» Integration of Statistical Models for Dictation of Document Translations in a MachineAided...

» NGramBased Statistical Machine Translation versus Syntax Augmented Machine Translation Com...

» Streambased Randomised Language Models for SMT

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ACL
Authors	David Talbot, Miles Osborne

Comments (0)