Sciweavers

VLDB
2007
ACM

Randomized Algorithms for Data Reconciliation in Wide Area Aggregate Query Processing

14 years 4 months ago
Randomized Algorithms for Data Reconciliation in Wide Area Aggregate Query Processing
Many aspects of the data integration problem have been considered in the literature: how to match schemas across different data sources, how to decide when different records refer to the same entity, how to efficiently perform the required entity resolution in a batch fashion, and so on. However, what has largely been ignored is a way to efficiently deploy these existing methods in a realistic, distributed enterprise integration environment. The straightforward use of existing methods often requires that all data be shipped to a coordinator for cleaning, which is often unacceptable. We develop a set of randomized algorithms that allow efficient application of existing entity resolution methods to the answering of aggregate queries over data that have been distributed across multiple sites. Using our methods, it is possible to efficiently generate aggregate query results that account for duplicate and inconsistent values scattered across a federated system.
Fei Xu, Chris Jermaine
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2007
Where VLDB
Authors Fei Xu, Chris Jermaine
Comments (0)