Sciweavers

EDBT
2008
ACM

Optimizing away joins on data streams

14 years 5 months ago
Optimizing away joins on data streams
Abstract. Monitoring aggregates on IP traffic data streams is a compelling application for data stream management systems. Often, such streaming aggregation queries involve joining multiple streams (e.g., streams of SYN and ACK packets) using temporal join conditions (e.g., within 5 seconds), followed by computation of aggregates (e.g., COUNT) over temporal tumbling windows (e.g., every 5 minutes). While such a query expression is natural, its evaluation over high speed IP traffic data streams is infeasible in practice. In this paper, we develop rewriting techniques for streaming aggregation queries that identify conditions under which such joins can be optimized away, while providing error bounds for results of the rewritten queries. The basis of the optimization is a powerful but decidable theory in which constraints over data streams can be formulated. The result error bounds are specified as functions of the boundary effects incurred during query rewriting.
Lukasz Golab, Theodore Johnson, Nick Koudas, Dives
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Where EDBT
Authors Lukasz Golab, Theodore Johnson, Nick Koudas, Divesh Srivastava, David Toman
Comments (0)