This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a ...
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...
This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of avail...