Sciweavers

CIKM
2005
Springer

Towards estimating the number of distinct value combinations for a set of attributes

13 years 10 months ago
Towards estimating the number of distinct value combinations for a set of attributes
Accurately and efficiently estimating the number of distinct values for some attribute(s) or sets of attributes in a data set is of critical importance to many database operations, such as query optimization and approximation query answering. Previous work has focused on the estimation of the number of distinct values for a single attribute and most existing work adopts a data sampling approach. This paper addresses the equally important issue of estimating the number of distinct value combinations for multiple attributes which we call COLSCARD (for COLumn Set CARDinality). It also takes a different approach that uses existing statistical information (e.g., histograms) available on the individual attributes to assist estimation. We start with cases where exact frequency information on individual attributes is available, and present a pair of lower and upper bounds on COLSCARD that are consistent with the available information, as well as an estimator of COLSCARD based on probability....
Xiaohui Yu, Calisto Zuzarte, Kenneth C. Sevcik
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where CIKM
Authors Xiaohui Yu, Calisto Zuzarte, Kenneth C. Sevcik
Comments (0)