Multilingual document clusters discovery

12 years 7 months ago
Multilingual document clusters discovery
Cross Language Information Retrieval community has brought up search engines over multilingual corpora, and multilingual text categorization systems. In this paper, we focus on the multilingual clusters discovery problem, which aim is to extract topic-related multilingual document clusters from a multilingual document collection in an unsupervised way. Our approach is based on a linguistic analysis of the documents that allows to identify relevant features for a vector representation of the documents, each language being associated with a different vector space. We propose a cross-lingual similarity measure for the documents, using bilingual dictionaries. A Shared Nearest Neighbor clustering algorithm is then used to build the clusters. We present an evaluation framework for this task, analyze and discuss the results we obtained and propose directions for future works. R
Benoît Mathieu, Romaric Besançon, Chr
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where RIAO
Authors Benoît Mathieu, Romaric Besançon, Christian Fluhr
Comments (0)