Sciweavers

PVLDB
2010

On Multi-Column Foreign Key Discovery

13 years 3 months ago
On Multi-Column Foreign Key Discovery
A foreign/primary key relationship between relational tables is one of the most important constraints in a database. From a data analysis perspective, discovering foreign keys is a crucial step in understanding and working with the data. Nevertheless, more often than not, foreign key constraints are not specified in the data, for various reasons; e.g., some associations are not known to designers but are inherent in the data, while others become invalid due to data inconsistencies. This work proposes a robust algorithm for discovering single-column and multi-column foreign keys. Previous work concentrated mostly on discovering single-column foreign keys using a variety of rules, like inclusion dependencies, column names, and minimum/maximum values. We first propose a general rule, termed Randomness, that subsumes a variety of other rules. We then develop efficient approximation algorithms for evaluating randomness, using only two passes over the data. Finally, we validate our appro...
Meihui Zhang, Marios Hadjieleftheriou, Beng Chin O
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where PVLDB
Authors Meihui Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M. Procopiuc, Divesh Srivastava
Comments (0)