Sciweavers

COLT
2008
Springer

Model Selection and Stability in k-means Clustering

13 years 6 months ago
Model Selection and Stability in k-means Clustering
Clustering Stability methods are a family of widely used model selection techniques applied in data clustering. Their unifying theme is that an appropriate model should result in a clustering which is robust with respect to various kinds of perturbations, as measured by a suitable instability measure. Despite their relative success, not much is known theoretically on why or when they work, or even what kind of assumptions they make in choosing an 'appropriate' model. In this paper, we focus on the behavior of clustering stability using k-means clustering. Our main technical result is an exact characterization of the value to which appropriately scaled measures of instability converge, based on a sample drawn from any distribution in Rn satisfying mild regularity conditions. Besides resolving a theoretical obstacle which has been raised in the literature, it allows us to draw several interesting observations about what kind of assumptions are actually made when using these me...
Ohad Shamir, Naftali Tishby
Added 18 Oct 2010
Updated 18 Oct 2010
Type Conference
Year 2008
Where COLT
Authors Ohad Shamir, Naftali Tishby
Comments (0)