An objective evaluation criterion for clustering

14 years 7 months ago
An objective evaluation criterion for clustering
We propose and test an objective criterion for evaluation of clustering performance: How well does a clustering algorithm run on unlabeled data aid a classification algorithm? The accuracy is quantified using the PAC-MDL bound [3] in a semisupervised setting. Clustering algorithms which naturally separate the data according to (hidden) labels with a small number of clusters perform well. A simple extension of the argument leads to an objective model selection method. Experimental results on text analysis datasets demonstrate that this approach empirically results in very competitive bounds on test set performance on natural datasets. Categories and Subject Descriptors: I.5.3 [Pattern Recognition]: Clustering
Arindam Banerjee, John Langford
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2004
Where KDD
Authors Arindam Banerjee, John Langford
Comments (0)