Sciweavers

EDBT
2006
ACM

Deferred Maintenance of Disk-Based Random Samples

13 years 6 months ago
Deferred Maintenance of Disk-Based Random Samples
Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on secondary storage. We show that the sample maintenance cost can be reduced by refreshing the sample in a deferred manner. We introduce a novel type of log file which follows the intuition that only a "sample" of the operations on the base data has to be considered to maintain a random sample in a statistically correct way. Additionally, we develop a deferred refresh algorithm which updates the sample by using fast sequential disk access only, and which does not require any main memory. We conducted an extensive set of experiments and found, that our algorithms reduce maintenance cost by several orders of magnitude.
Rainer Gemulla, Wolfgang Lehner
Added 14 Oct 2010
Updated 14 Oct 2010
Type Conference
Year 2006
Where EDBT
Authors Rainer Gemulla, Wolfgang Lehner
Comments (0)