Sciweavers

KDD
2005
ACM

A hybrid unsupervised approach for document clustering

14 years 5 months ago
A hybrid unsupervised approach for document clustering
We propose a hybrid, unsupervised document clustering approach that combines a hierarchical clustering algorithm with Expectation Maximization. We developed several heuristics to automatically select a subset of the clusters generated by the first algorithm as the initial points of the second one. Furthermore, our initialization algorithm generates not only an initial model for the iterative refinement algorithm but also an estimate of the model dimension, thus eliminating another important element of human supervision. We have evaluated the proposed system on five real-world document collections. The results show that our approach generates clustering solutions of higher quality than both its individual components. Categories and Subject Descriptors: H.3.3: Clustering General Terms: Algorithms
Mihai Surdeanu, Jordi Turmo, Alicia Ageno
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2005
Where KDD
Authors Mihai Surdeanu, Jordi Turmo, Alicia Ageno
Comments (0)