Sketching Sampled Data Streams

12 years 1 months ago
Sketching Sampled Data Streams
—Sampling is used as a universal method to reduce the running time of computations – the computation is performed on a much smaller sample and then the result is scaled to compensate for the difference in size. Sketches are a popular approximation method for data streams and they proved to be useful for estimating frequency moments and aggregates over joins. A possibility to further improve the time performance of sketches is to compute the sketch over a sample of the stream rather than the entire data stream. In this paper we analyze the behavior of the sketch estimator when computed over a sample of the stream, not the entire data stream, for the size of join and the self-join size problems. Our analysis is developed for a generic sampling process. We instantiate the results of the analysis for all three major types of sampling – Bernoulli sampling which is used for load shedding, sampling with replacement which is used to generate i.i.d. samples from a distribution, and sampli...
Florin Rusu, Alin Dobra
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where ICDE
Authors Florin Rusu, Alin Dobra
Comments (0)