Sciweavers

EDBT
2009
ACM

Optimized union of non-disjoint distributed data sets

13 years 11 months ago
Optimized union of non-disjoint distributed data sets
In a variety of applications, ranging from data integration to distributed query evaluation, there is a need to obtain sets of data items from several sources (peers) and compute their union. As these sets often contain common data items, avoiding the transmission of redundant information is essential for effective union computation. In this paper we define the notion of optimal union plans for nondisjoint data sets residing on distinct peers, and present efficient algorithms for computing and executing such optimal plans. Our algorithms avoid redundant data transmission and optimally exploit the network bandwidth capabilities. A challenge in the design of optimal plans is the lack of a complete map of the distribution of the data items among peers. We analyze the information required for optimal planning and propose novel techniques to obtain compact, cheap to communicate, description of the data sources. We then exploit it for efficient union computation with reasonable accuracy....
Itay Dar, Tova Milo, Elad Verbin
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where EDBT
Authors Itay Dar, Tova Milo, Elad Verbin
Comments (0)