Sciweavers

ICDE
2010
IEEE

Detecting Inconsistencies in Distributed Data

14 years 1 months ago
Detecting Inconsistencies in Distributed Data
— One of the central problems for data quality is inconsistency detection. Given a database D and a set Σ of dependencies as data quality rules, we want to identify tuples in D that violate some rules in Σ. When D is a centralized database, there have been effective SQL-based techniques for finding violations. It is, however, far more challenging when data in D is distributed, in which inconsistency detection often necessarily requires shipping data from one site to another. This paper develops techniques for detecting violations of conditional functional dependencies (CFDs) in relations that are fragmented and distributed across different sites. (1) We formulate the detection problem in various distributed settings as optimization problems, measured by either network traffic or response time. (2) We show that it is beyond reach in practice to find optimal detection methods: the detection problem is NP-complete when the data is partitioned either horizontally or vertically, and ...
Wenfei Fan, Floris Geerts, Shuai Ma, Heiko Mü
Added 07 Mar 2010
Updated 07 Mar 2010
Type Conference
Year 2010
Where ICDE
Authors Wenfei Fan, Floris Geerts, Shuai Ma, Heiko Müller
Comments (0)