Sciweavers

ICDM
2008
IEEE

Clustering Documents with Active Learning Using Wikipedia

13 years 11 months ago
Clustering Documents with Active Learning Using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document dataset...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Where ICDM
Authors Anna Huang, David N. Milne, Eibe Frank, Ian H. Witten
Comments (0)