Sciweavers

ICDE
2007
IEEE

Conquering the Divide: Continuous Clustering of Distributed Data Streams

14 years 6 months ago
Conquering the Divide: Continuous Clustering of Distributed Data Streams
Data is often collected over a distributed network, but in many cases, is so voluminous that it is impractical and undesirable to collect it in a central location. Instead, we must perform distributed computations over the data, guaranteeing high quality answers even as new data arrives. In this paper, we formalize and study the problem of maintaining a clustering of such distributed data that is continuously evolving. In particular, our goal is to minimize the communication and computational cost, still providing guaranteed accuracy of the clustering. We focus on the k-center clustering, and provide a suite of algorithms that vary based on which centralized algorithm they derive from, and whether they maintain a single global clustering or many local clusterings that can be merged together. We show that these algorithms can be designed to give accuracy guarantees that are close to the best possible even in the centralized case. In our experiments, we see clear trends among these algo...
Graham Cormode, S. Muthukrishnan, Wei Zhuang
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2007
Where ICDE
Authors Graham Cormode, S. Muthukrishnan, Wei Zhuang
Comments (0)