Sciweavers

SIGMOD
2007
ACM

Leveraging aggregate constraints for deduplication

14 years 4 months ago
Leveraging aggregate constraints for deduplication
We show that aggregate constraints (as opposed to pairwise constraints) that often arise when integrating multiple sources of data, can be leveraged to enhance the quality of deduplication. However, despite its appeal, we show that the problem is challenging, both semantically and computationally. We define a restricted search space for deduplication that is intuitive in our context and we solve the problem optimally for the restricted space. Our experiments on real data show that incorporating aggregate constraints significantly enhances the accuracy of deduplication. Categories and Subject Descriptors H.2 [Database Management]: Systems General Terms Design, Algorithms, Experimentation Keywords Deduplication, Entity resolution, Constraint satisfaction
Surajit Chaudhuri, Anish Das Sarma, Venkatesh Gant
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2007
Where SIGMOD
Authors Surajit Chaudhuri, Anish Das Sarma, Venkatesh Ganti, Raghav Kaushik
Comments (0)