Sciweavers

SIGMOD
2010
ACM

ERACER: a database approach for statistical inference and data cleaning

13 years 9 months ago
ERACER: a database approach for statistical inference and data cleaning
Real-world databases often contain syntactic and semantic errors, in spite of integrity constraints and other safety measures incorporated into modern DBMSs. We present ERACER, an iterative statistical framework for inferring missing information and correcting such errors automatically. Our approach is based on belief propagation and relational dependency networks, and includes an efficient approximate inference algorithm that is easily implemented in standard DBMSs using SQL and user defined functions. The system performs the inference and cleansing tasks in an integrated manner, using shrinkage techniques to infer correct values accurately even in the presence of dirty data. We evaluate the proposed methods empirically on multiple synthetic and real data sets. The results show that our framework achieves accuracy comparable to a baseline statistical method using Bayesian networks with exact inference. However, our framework has wider applicability than the Bayesian network baseline...
Chris Mayfield, Jennifer Neville, Sunil Prabhakar
Added 18 Jul 2010
Updated 18 Jul 2010
Type Conference
Year 2010
Where SIGMOD
Authors Chris Mayfield, Jennifer Neville, Sunil Prabhakar
Comments (0)