Sciweavers

Share
ICASSP
2009
IEEE

Resampling auxiliary data for language model adaptation in machine translation for speech

10 years 5 months ago
Resampling auxiliary data for language model adaptation in machine translation for speech
Performance of n-gram language models depends to a large extent on the amount of training text material available for building the models and the degree to which this text matches the domain of interest. The language modeling community is showing a growing interest in using large collections of auxiliary textual material to supplement sparse in-domain resources. One of the problems in using such auxiliary corpora is that they may differ significantly from the specific nature of the domain of interest. In this paper, we propose three different methods for adapting language models for a Speech to Speech (S2S) translation system when auxiliary corpora are of different genre and domain. The proposed methods are based on centroid similarity, n-gram ratios and resampled language models. We show how these methods can be used to select out of domain textual data such as newswire text to improve a S2S system. We were able to achieve an overall relative improvement of 3.8% in BLEU score over ...
Sameer Maskey, Abhinav Sethy
Added 21 May 2010
Updated 21 May 2010
Type Conference
Year 2009
Where ICASSP
Authors Sameer Maskey, Abhinav Sethy
Comments (0)
books