Sciweavers

STOC
2003
ACM

Better streaming algorithms for clustering problems

14 years 4 months ago
Better streaming algorithms for clustering problems
We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k?Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n). This is a significant improvement of the previous best algorithm which yielded a 2O(1/ ) approximation using O(n ) space. Next we give a streaming algorithm for the k?Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly. Categories and Subject Descriptors F.2.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity--computations on discrete structures General Terms Algorithms,Theory Keywords Clustering,k-med...
Moses Charikar, Liadan O'Callaghan, Rina Panigrahy
Added 03 Dec 2009
Updated 03 Dec 2009
Type Conference
Year 2003
Where STOC
Authors Moses Charikar, Liadan O'Callaghan, Rina Panigrahy
Comments (0)