Sciweavers

CLEANDB
2006
ACM

Generic Entity Resolution with Data Confidences

13 years 8 months ago
Generic Entity Resolution with Data Confidences
We consider the Entity Resolution (ER) problem (also known as deduplication, or merge-purge), in which records determined to represent the same real-world entity are successively located and merged. Our approach to the ER problem is generic, in the sense that the functions for comparing and merging records are viewed as black-boxes. In this context, managing numerical confidences along with the data makes the ER problem more challenging to define (e.g., how should confidences of merged records be combined?), and more expensive to compute. In this paper, we propose a sound and flexible model for the ER problem with confidences, and propose efficient algorithms to solve it. We validate our algorithms through experiments that show significant performance improvements over naive schemes.
David Menestrina, Omar Benjelloun, Hector Garcia-M
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CLEANDB
Authors David Menestrina, Omar Benjelloun, Hector Garcia-Molina
Comments (0)