With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Phoenix/App supports software components whose states are made persistent across a system crash via redo recovery, replaying logged interactions. Our initial prototype force logge...
This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of avail...
Recent developments in the field of object-based fault tolerance and the advent of the first OMG FTCORBA compliant middleware raise new requirements for the design process of dist...
In this demonstration we present BRRL, a library for making distributed main-memory applications fault tolerant. BRRL is optimized for cloud applications with frequent points of c...
Tuan Cao, Benjamin Sowell, Marcos Antonio Vaz Sall...