Sciweavers

PODS
2005
ACM

Histograms revisited: when are histograms the best approximation method for aggregates over joins?

14 years 4 months ago
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency ? the so called uniform distribution assumption. In this paper we show that a significantly less restrictive statistical assumption ? the elements within a bucket are randomly arranged even though they might have different frequencies ? leads to identical formulae for approximating aggregate queries using histograms. This observation allows us to identify scenarios in which histograms are well suited as approximation methods ? in fact we show that in these situations sampling and sketching are significantly worse ? and provide tight error guarantees for the quality of approximations. At the same time we show that, on average, histograms are rather poor approximators outside these scenarios.
Alin Dobra
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2005
Where PODS
Authors Alin Dobra
Comments (0)