Workload-Optimal Histograms on Streams

9 years 9 months ago
Workload-Optimal Histograms on Streams
Histograms are used in many ways in conventional databases and in data stream processing for summarizing massive data distributions. Previous work on constructing histograms on data streams with provable guarantees have not taken into account the workload characteristics of databases which show some parts of the distributions to be more frequently used than the others; on the other hand, previous work for constructing histograms that do make use of the workload characteristics–and have demonstrated the significant advantage of exploiting workload information–have not come with provable guarantees on the accuracy of the histograms or the time and space bounds needed to obtain reasonable accuracy. We study the algorithmic complexity of constructing workload-optimal histograms on data streams. We present an algorithm for constructing a nearly-optimal histogram in nearly linear time and polylogarithmic space, in one pass. In the more general cash register model where data is streamed...
S. Muthukrishnan, Martin Strauss, X. Zheng
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ESA
Authors S. Muthukrishnan, Martin Strauss, X. Zheng
Comments (0)