Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
A network G is called random-fault-tolerant (RFT) network for a network G if G contains a fault-free isomorphic copy of G with high probability even if each processor fails indepe...
This paper describes the design and implementation of a fault-tolerant CORBA naming service - CosNamingFT. Every CORBA object is accessed through its Interoperable Object Referenc...
Lau Cheuk Lung, Joni da Silva Fraga, Jean-Marie Fa...
Maintaining mobile agent availability in the presence of agent server crashes is a challenging issue since developers normally have no control over remote agent servers. A popular...