Sciweavers

VLDB
2002
ACM

ALIAS: An Active Learning led Interactive Deduplication System

13 years 4 months ago
ALIAS: An Active Learning led Interactive Deduplication System
Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domainspecific operation. We present our design of alias that uses a novel approach to ease this task by limiting the manual effort to inputing simple, domain-specific attribute similarity functions and interactively labeling a small number of record pairs. We describe how active learning is useful in selecting informative examples of duplicates and non-duplicates that can be used to train a deduplication function. alias provides mechanism for efficiently applying the function on large lists of records using a novel cluster-based execution model.
Sunita Sarawagi, Anuradha Bhamidipaty, Alok Kirpal
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where VLDB
Authors Sunita Sarawagi, Anuradha Bhamidipaty, Alok Kirpal, Chandra Mouli
Comments (0)