Sciweavers

IPPS
1999
IEEE
13 years 9 months ago
The Performance of Coordinated and Independent Checkpointing
Checkpointing is a very effective technique to tolerate the occurrence of failures in distributed and parallel applications. The existing algorithms in the literature are basicall...
Luís Moura Silva, João Gabriel Silva
SC
2000
ACM
13 years 9 months ago
Scalable Fault-Tolerant Distributed Shared Memory
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a ...
Florin Sultan, Thu D. Nguyen, Liviu Iftode