Sciweavers

PVLDB
2010

Record Linkage with Uniqueness Constraints and Erroneous Values

13 years 2 months ago
Record Linkage with Uniqueness Constraints and Erroneous Values
Many data-management applications require integrating data from a variety of sources, where different sources may refer to the same real-world entity in different ways and some may even provide erroneous data. An important task in this process is to recognize and merge the various references that refer to the same entity. In practice, some attributes satisfy a uniqueness constraint—each real-world entity (or most entities) has a unique value for the attribute (e.g., business contact phone, address, and email). Traditional techniques tackle this case by first linking records that are likely to refer to the same real-world entity, and then fusing the linked records and resolving conflicts if any. Such methods can fall short for three reasons: first, erroneous values from sources may prevent correct linking; second, the real world may contain exceptions to the uniqueness constraints and always enforcing uniqueness can miss correct values; third, locally resolving conflicts for link...
Songtao Guo, Xin Dong, Divesh Srivastava, Remi Zaj
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where PVLDB
Authors Songtao Guo, Xin Dong, Divesh Srivastava, Remi Zajac
Comments (0)