Sciweavers

ICALP
2009
Springer

Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems

14 years 4 months ago
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems
Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a possibly inconsistent binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the only reasonable way to measure the cost of a solution is by comparing it with the input similarity function. This problem has been studied in theory and application and has been subsequently proven to be APX-Hard. In this work we assume that there does exist an unknown correct clustering of the data. This is the case in applications such as record linkage in databases. In this setting, we argue that it is more reasonable to measure accuracy of the output clustering against the unknown underlying true clustering. This corresponds to the intuition that in real life an action is penalized or rewarded based on reality and not on our noisy perception thereof. The traditional combinatorial optimizati...
Nir Ailon, Edo Liberty
Added 03 Dec 2009
Updated 03 Dec 2009
Type Conference
Year 2009
Where ICALP
Authors Nir Ailon, Edo Liberty
Comments (0)