Sciweavers

ICDM
2006
IEEE

Adaptive Blocking: Learning to Scale Up Record Linkage

13 years 10 months ago
Adaptive Blocking: Learning to Scale Up Record Linkage
Many information integration tasks require computing similarity between pairs of objects. Pairwise similarity computations are particularly important in record linkage systems, as well as in clustering and schema mapping algorithms. Because the computational cost of estimating similarity between all pairs of instances grows quadratically with the size of the input dataset, computing similarity between all object pairs is impractical and becomes prohibitive for large datasets and complex similarity functions, preventing scaling record linkage to large datasets. Blocking methods alleviate this problem by efficiently selecting a subset of object pairs for which similarity is computed, leaving out the remaining pairs as dissimilar. Previously proposed blocking methods require manually constructing a similarity function or a set of predicates followed by hand-tuning of parameters. In this paper, we introduce an adaptive framework for training blocking functions to be efficient and accura...
Mikhail Bilenko, Beena Kamath, Raymond J. Mooney
Added 11 Jun 2010
Updated 11 Jun 2010
Type Conference
Year 2006
Where ICDM
Authors Mikhail Bilenko, Beena Kamath, Raymond J. Mooney
Comments (0)