Cross-Lingual Latent Topic Extraction

10 years 2 months ago
Cross-Lingual Latent Topic Extraction
Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. In this paper, we propose a way to incorporate a bilingual dictionary into a probabilistic topic model so that we can apply topic models to extract shared latent topics in text data of different languages. Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Probabilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary. Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multiling...
Duo Zhang, Qiaozhu Mei, ChengXiang Zhai
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where ACL
Authors Duo Zhang, Qiaozhu Mei, ChengXiang Zhai
Comments (0)