Sciweavers

PVLDB
2010

Global Detection of Complex Copying Relationships Between Sources

13 years 2 months ago
Global Detection of Complex Copying Relationships Between Sources
Web technologies have enabled data sharing between sources but also simplified copying (and often publishing without proper attribution). The copying relationships can be complex: some sources copy from multiple sources on different subsets of data; some co-copy from the same source, and some transitively copy from another. Understanding such copying relationships is desirable both for business purposes and for improving many key components in data integration, such as resolving conflicts across various sources, reconciling distinct references to the same real-world entity, and efficiently answering queries over multiple sources. Recent works have studied how to detect copying between a pair of sources, but the techniques can fall short in the presence of complex copying relationships. In this paper we describe techniques that discover global copying relationships between a set of structured sources. Towards this goal we make two contributions. First, we propose a global detection ...
Xin Dong, Laure Berti-Equille, Yifan Hu, Divesh Sr
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where PVLDB
Authors Xin Dong, Laure Berti-Equille, Yifan Hu, Divesh Srivastava
Comments (0)