Sciweavers

Share
CIKM
2009
Springer

Cross-language linking of news stories on the web using interlingual topic modelling

10 years 2 months ago
Cross-language linking of news stories on the web using interlingual topic modelling
We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua information obtained through probabilistic topic models trained on comparable corpora written in two languages (in our case English and Dutch). The achieve this, we expand the Latent Dirichlet Allocation model to process documents in two languages. We demonstrate the validity of the learned interlingual topics in a document clustering task, where the evaluation is performed on Google News. Categories and Subject Descriptors G.3 [Probability and Statistics]: Stochastic Processes; I.2.7 [Artificial Intelligence]: Natural Language Processing—Machine translation; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Clustering General Terms Algorithms, Measurement Keywords Latent Dirichlet Allocation, Event Detection
Wim De Smet, Marie-Francine Moens
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where CIKM
Authors Wim De Smet, Marie-Francine Moens
Comments (0)
books