Sciweavers

APPROX
2008
Springer

Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

13 years 6 months ago
Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity
Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-center clustering problem. We give a streaming (4+ )-approximation algorithm using O( -1 kz) memory for the problem with outliers, in which the clustering is allowed to drop up to z of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + )-approximation algorithm using O( -1 ln( -1 )k + k2 ) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points. Key words: clustering, k-center, streaming, outliers, anonymity
Richard Matthew McCutchen, Samir Khuller
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where APPROX
Authors Richard Matthew McCutchen, Samir Khuller
Comments (0)