Sciweavers

VLDB
1995
ACM

Sampling-Based Estimation of the Number of Distinct Values of an Attribute

13 years 8 months ago
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
We provide several new sampling-based estimators of the number of distinct values of an attribute in a relation. We compare these new estimators to estimators from the database and statistical literature empirically, using a large number of attribute-value distributions drawn from a variety of real-world databases. This appears to be the first extensive comparison of distinct-value estimators in either the database or statistical literature, and is certainly the first to use highlyskewed data of the sort frequently encountered in database applications. Our experiments indicate that a new “hybrid” estimator yields the highest precision on average for a given sampling fraction. This estimator explicitly takes into account the degree of skew in the data and combines a new “smoothed jackknife” estimator with an estimator due to Shlosser. We investigate how the hybrid estimator behaves as we scale up the size of the database.
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, L
Added 26 Aug 2010
Updated 26 Aug 2010
Type Conference
Year 1995
Where VLDB
Authors Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes
Comments (0)