Sciweavers

ICASSP
2009
IEEE

Incorporating monolingual corpora into bilingual latent semantic analysis for crosslingual LM adaptation

13 years 10 months ago
Incorporating monolingual corpora into bilingual latent semantic analysis for crosslingual LM adaptation
The major limitation in bilingual latent semantic analysis (bLSA) is the requirement of parallel training corpora. Motivated by semi-supervised learning, we propose a clusterbased bLSA training approach to incorporate monolingual corpora. Treating each parallel document pair as centroids of the parallel document clusters, each monolingual document is associated to the closest centroid according to their topic similarity. The resulting parallel document clusters are used as constraints to enforce a one-to-one topic correspondence in variational EM. Slight performance improvement in crosslingual language model adaptation is observed compared to the baseline without monolingual corpora.
Yik-Cheung Tam, Tanja Schultz
Added 21 May 2010
Updated 21 May 2010
Type Conference
Year 2009
Where ICASSP
Authors Yik-Cheung Tam, Tanja Schultz
Comments (0)