Sciweavers

COLT
2008
Springer

Relating Clustering Stability to Properties of Cluster Boundaries

14 years 10 months ago
Relating Clustering Stability to Properties of Cluster Boundaries
In this paper, we investigate stability-based methods for cluster model selection, in particular to select the number K of clusters. The scenario under consideration is that clustering is performed by minimizing a certain clustering quality function, and that a unique global minimizer exists. On the one hand we show that stability can be upper bounded by certain properties of the optimal clustering, namely by the mass in a small tube around the cluster boundaries. On the other hand, we provide counterexamples which show that a reverse statement is not true in general. Finally, we give some examples and arguments why, from a theoretic point of view, using clustering stability in a high sample setting can be problematic. It can be seen that distribution-free guarantees bounding the difference between the finite sample stability and the "true stability" cannot exist, unless one makes strong assumptions on the underlying distribution.
Shai Ben-David, Ulrike von Luxburg
Added 18 Oct 2010
Updated 18 Oct 2010
Type Conference
Year 2008
Where COLT
Authors Shai Ben-David, Ulrike von Luxburg
Comments (0)