As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adapt...
This work presents a software-implemented fault tolerance approach for building a reliable database application in a CORBA environment. Database applications have functional requi...
Domenico Cotroneo, Nicola Mazzocca, Luigi Romano, ...
Reliability is a major requirement for most safety-related systems. To meet this requirement, fault-tolerant techniques such as hardware replication and software re-execution are ...
Jia Huang, Jan Olaf Blech, Andreas Raabe, Christia...
We present S, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. S i...
After a system crash, databases recover to the last committed transaction, but applications usually either crash or cannot continue. The Phoenix purpose is to enable application s...