Sciweavers

SIGMOD
2004
ACM

Effective Use of Block-Level Sampling in Statistics Estimation

14 years 4 months ago
Effective Use of Block-Level Sampling in Statistics Estimation
Block-level sampling is far more efficient than true uniform-random sampling over a large database, but prone to significant errors if used to create database statistics. In this paper, we develop principled approaches to overcome this limitation of block-level sampling for histograms as well as distinct-value estimations. For histogram construction, we give a novel two-phase adaptive method in which the sample size required to reach a desired accuracy is decided based on a first phase sample. This method is significantly faster than previous iterative methods proposed for the same problem. For distinct-value estimation, we show that existing estimators designed for uniform-random samples may perform very poorly if used directly on block-level samples. We present a key technique that computes an appropriate subset of a block-level sample that is suitable for use with most existing estimators. This, to the best of our knowledge, is the first principled method for distinct-value estimat...
Surajit Chaudhuri, Gautam Das, Utkarsh Srivastava
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2004
Where SIGMOD
Authors Surajit Chaudhuri, Gautam Das, Utkarsh Srivastava
Comments (0)