We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network applications on commodity clusters. ZapC provides a thin virtualization ...
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes
Replication is one of the prominent approaches for obtaining fault tolerance. Implementing replication on commodity hardware and in a transparent fashion, i.e., without changing t...
Despite a large research effort, software distributed shared memory systems have not been widely used to run parallel applications across clusters of computers. The higher perform...
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...