Sciweavers

NAACL
2010

Stream-based Translation Models for Statistical Machine Translation

13 years 2 months ago
Stream-based Translation Models for Statistical Machine Translation
Typical statistical machine translation systems are trained with static parallel corpora. Here we account for scenarios with a continuous incoming stream of parallel training data. Such scenarios include daily governmental proceedings, sustained output from translation agencies, or crowd-sourced translations. We show incorporating recent sentence pairs from the stream improves performance compared with a static baseline. Since frequent batch retraining is computationally demanding we introduce a fast incremental alternative using an online version of the EM algorithm. To bound our memory requirements we use a novel data-structure and associated training regime. When compared to frequent batch retraining, our online time and space-bounded model achieves the same performance with significantly less computational overhead.
Abby Levenberg, Chris Callison-Burch, Miles Osborn
Added 14 Feb 2011
Updated 14 Feb 2011
Type Journal
Year 2010
Where NAACL
Authors Abby Levenberg, Chris Callison-Burch, Miles Osborne
Comments (0)