Sciweavers

10 search results - page 1 / 2
» Coherence-based Coordinated Checkpointing for Software Distr...
Sort
View
ICDCS
2000
IEEE
13 years 9 months ago
Coherence-based Coordinated Checkpointing for Software Distributed Shared Memory Systems
Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel compu...
Angkul Kongmunvattana, Santipong Tanchatchawal, Ni...
SC
2000
ACM
13 years 9 months ago
Scalable Fault-Tolerant Distributed Shared Memory
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a ...
Florin Sultan, Thu D. Nguyen, Liviu Iftode
SRDS
1994
IEEE
13 years 8 months ago
Coordinated Checkpointing-Rollback Error Recovery for Distributed Shared Memory Multicomputers
Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require unnecessarily high checkpointing frequency and checkpoint traffic, which are sens...
G. Janakiraman, Yuval Tamir
SRDS
1999
IEEE
13 years 8 months ago
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems
Software distributed shared memory (DSM) improves the programmability of message-passing machines and workclusters by providing a shared memory abstract (i.e., a coherent global a...
Angkul Kongmunvattana, Nian-Feng Tzeng
ISCA
2002
IEEE
115views Hardware» more  ISCA 2002»
13 years 9 months ago
ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors
This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of avail...
Milos Prvulovic, Josep Torrellas, Zheng Zhang