Sciweavers

DEXA
2011
Springer

Learning Top-k Transformation Rules

12 years 4 months ago
Learning Top-k Transformation Rules
Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.
Sunanda Patro, Wei Wang
Added 18 Dec 2011
Updated 18 Dec 2011
Type Journal
Year 2011
Where DEXA
Authors Sunanda Patro, Wei Wang
Comments (0)