Sciweavers

1256 search results - page 2 / 252
» On Coordinated Checkpointing in Distributed Systems
Sort
View
SC
2000
ACM
13 years 10 months ago
Scalable Fault-Tolerant Distributed Shared Memory
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a ...
Florin Sultan, Thu D. Nguyen, Liviu Iftode
WAIM
2004
Springer
13 years 11 months ago
A Low-Cost Checkpointing Scheme for Mobile Computing Systems
In distributed computing systems, processes in different hosts take checkpoints to survive failures. For mobile computing systems, due to certain new characteristics conventional d...
Guohui Li, Hongya Wang, Jixiong Chen
CLUSTER
2003
IEEE
13 years 11 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
IPPS
2007
IEEE
14 years 2 days ago
An optimistic checkpointing and selective message logging approach for consistent global checkpoint collection in distributed sy
In this paper, we present an asynchronous consistent global checkpoint collection algorithm which prevents contention for network storage at the file server and hence reduces the...
Qiangfeng Jiang, D. Manivannan