Sciweavers

PAKDD
2009
ACM

Clustering Documents Using a Wikipedia-Based Concept Representation

13 years 11 months ago
Clustering Documents Using a Wikipedia-Based Concept Representation
Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques.
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where PAKDD
Authors Anna Huang, David N. Milne, Eibe Frank, Ian H. Witten
Comments (0)