Highly-Available, Fault-Tolerant, Parallel Dataflows

12 years 11 months ago
Highly-Available, Fault-Tolerant, Parallel Dataflows
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to implement a variety of continuous query (CQ) applications that require high-throughput, 24x7 operation. Examples include network monitoring, phone call processing, click-stream processing, and online financial analysis. Our main contribution is a scheme that carefully integrates traditional query processing techniques for partitioned parallelism with the process-pairs approach for high availability. This delicate integration allows us to tolerate failures of portions of a parallel dataflow without sacrificing result quality. Upon failure, our technique provides quick fail-over, and automatically recovers the lost pieces on the fly. This piecemeal recovery provides minimal disruption to the ongoing dataflow computation and improved reliability as compared to the straight-forward application of the process-pairs ...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2004
Authors Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brewer
Comments (0)