Sciweavers

COLT
2008
Springer

Relating Clustering Stability to Properties of Cluster Boundaries

13 years 6 months ago
Relating Clustering Stability to Properties of Cluster Boundaries
In this paper, we investigate stability-based methods for cluster model selection, in particular to select the number K of clusters. The scenario under consideration is that clustering is performed by minimizing a certain clustering quality function, and that a unique global minimizer exists. On the one hand we show that stability can be upper bounded by certain properties of the optimal clustering, namely by the mass in a small tube around the cluster boundaries. On the other hand, we provide counterexamples which show that a reverse statement is not true in general. Finally, we give some examples and arguments why, from a theoretic point of view, using clustering stability in a high sample setting can be problematic. It can be seen that distribution-free guarantees bounding the difference between the finite sample stability and the "true stability" cannot exist, unless one makes strong assumptions on the underlying distribution.
Shai Ben-David, Ulrike von Luxburg
Added 18 Oct 2010
Updated 18 Oct 2010
Type Conference
Year 2008
Where COLT
Authors Shai Ben-David, Ulrike von Luxburg
Comments (0)