Sciweavers

SIGIR
2002
ACM

Document clustering with cluster refinement and model selection capabilities

13 years 4 months ago
Document clustering with cluster refinement and model selection capabilities
In this paper, we propose a document clustering method that strives to achieve: (1) a high accuracy of document clustering, and (2) the capability of estimating the number of clusters in the document corpus (i.e. the model selection capability). To accurately cluster the given document corpus, we employ a richer feature set to represent each document, and use the Gaussian Mixture Model (GMM) together with the Expectation-Maximization (EM) algorithm to conduct an initial document clustering. From this initial result, we identify a set of discriminative features for each cluster, and refine the initially obtained document clusters by voting on the cluster label of each document using this discriminative feature set. This self-refinement process of discriminative feature identification and cluster label voting is iteratively applied until the convergence of document clusters. On the other hand, the model selection capability is achieved by introducing randomness in the cluster initializa...
Xin Liu, Yihong Gong, Wei Xu, Shenghuo Zhu
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where SIGIR
Authors Xin Liu, Yihong Gong, Wei Xu, Shenghuo Zhu
Comments (0)