Spectral Clustering with Perturbed Data

12 years 1 months ago
Spectral Clustering with Perturbed Data
Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to ...
Ling Huang, Donghui Yan, Michael I. Jordan, Nina T
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where NIPS
Authors Ling Huang, Donghui Yan, Michael I. Jordan, Nina Taft
Comments (0)