Transparent system support for software fault tolerance reduces performance in general and precludes application-specific optimizations in particular. In contrast, explicit support...
Quorum protocols offer several benefits when used to maintain replicated data but techniques for reducing overheads associated with them have not been explored in detail. It is d...
Lei Kong, Deepak J. Manohar, Mustaque Ahamad, Arun...
A popular approach to guarantee fault tolerance in safety-critical applications is to run the application on two processors. A checkpoint is inserted at the completion of the prim...
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
1 Over the last years, an increasing number of safety-critical tasks have been demanded to computer systems. In particular, safety-critical computer-based applications are hitting ...
Maurizio Rebaudengo, Matteo Sonza Reorda, Marco To...