Histograms reloaded: the merits of bucket diversity

9 years 7 months ago
Histograms reloaded: the merits of bucket diversity
Virtually all histograms store for each bucket the number of distinct values it contains and their average frequency. In this paper, we question this paradigm. We start out by investigating the estimation precision of three commercial database systems which also follow the above paradigm. It turns out that huge errors are quite common. We then introduce new bucket types and investigate their accuracy when building optimal histograms with them. The results are ambiguous. There is no clear winner among the bucket types. At this point, we (1) switch to heterogeneous histograms, where different buckets of the same histogram possibly are of different types, and (2) design more bucket types. The nice consequence of introducing heterogeneous histograms is that we can guarantee decent upper error bounds while at the same time heterogeneous histograms require far less space than homogeneous histograms. Categories and Subject Descriptors H.2.4 [Database Management]: Systems—Query process
Carl-Christian Kanne, Guido Moerkotte
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Authors Carl-Christian Kanne, Guido Moerkotte
Comments (0)