Approximation algorithms for clustering uncertain data

11 years 1 months ago
Approximation algorithms for clustering uncertain data
There is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. This uncertainty is typically formalized as probability density functions over tuple values. Beyond storing and processing such data in a DBMS, it is necessary to perform other data analysis tasks such as data mining. We study the core mining problem of clustering on uncertain data, and define appropriate natural generalizations of standard clustering optimization criteria. Two variations arise, depending on whether a point is automatically associated with its optimal center, or whether it must be assigned to a fixed cluster no matter where it is actually located. For uncertain versions of k-means and k-median, we show reductions to their corresponding weighted versions on data with no uncertainties. These are simple in the unassigned case, but require some care for the assigned version. Our most interesting results...
Graham Cormode, Andrew McGregor
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Where PODS
Authors Graham Cormode, Andrew McGregor
Comments (0)