Checkpointing and rollback recovery is a very effective technique to tolerate transient faults and preventive shutdowns. In the past, most of the checkpointing schemes published i...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
Dual core architectures are commonly used to establish fault tolerance on the node level. Since comparison is usually performed for the outputs only, no precise diagnostic informa...
Christian El Salloum, Andreas Steininger, Peter Tu...
This paper presents novel service-based Grid data management middleware that leverages standards defined by WSRF specifications to create and manage dynamic Grid file system sessi...
Ming Zhao 0002, Vineet Chadha, Renato J. O. Figuei...
— Internet Service Providers design their network with resiliency in mind, having multiple paths towards external IP subnets available at the borders of their network. However, w...