Mining multilingual topics from wikipedia

11 years 8 days ago
Mining multilingual topics from wikipedia
In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages. Based on the observation that one Wikipedia concept may be described by articles in different languages, we adapt existing topic modeling algorithm for mining multilingual topics from this knowledge base. The extracted "universal" topics have multiple types of representations, with each type corresponding to one language. Accordingly, new documents of different languages can be represented in a space using a group of universal topics, which makes various multilingual Web applications feasible. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; General Terms Algorithms, Performance, Experimentation Keywords Multilingual, Wikipedia, Topic Modeling, Universal-topics
Xiaochuan Ni, Jian-Tao Sun, Jian Hu, Zheng Chen
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2009
Where WWW
Authors Xiaochuan Ni, Jian-Tao Sun, Jian Hu, Zheng Chen
Comments (0)