Sciweavers

DASFAA
2010
IEEE

Scalable Splitting of Massive Data Streams

13 years 11 months ago
Scalable Splitting of Massive Data Streams
Scalable execution of continuous queries over massive data streams often requires splitting input streams into parallel sub-streams over which query operators are executed in parallel. Automatic stream splitting is in general very difficult, as the optimal parallelization may depend on application semantics. To enable application specific stream splitting, we introduce splitstream functions where the user specifies non-procedural stream partitioning and replication. For high-volume streams, the stream splitting itself becomes a performance bottleneck. A cost model is introduced that estimates the performance of splitstream functions with respect to throughput and CPU usage. We implement parallel splitstream functions, and relate experimental results to cost model estimates. Based on the results, a splitstream function called autosplit is proposed, which scales well for high degrees of parallelism, and is robust for varying proportions of stream partitioning and replication. We show how...
Erik Zeitler, Tore Risch
Added 17 May 2010
Updated 17 May 2010
Type Conference
Year 2010
Where DASFAA
Authors Erik Zeitler, Tore Risch
Comments (0)