Stream-based Randomised Language Models for SMT

10 years 4 months ago
Stream-based Randomised Language Models for SMT
Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online randomised model matches the performance of batch-based LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day.
Abby Levenberg, Miles Osborne
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Authors Abby Levenberg, Miles Osborne
Comments (0)