The probability of failures in software distributed shared memory (SDSM) increases as the system size grows. This paper introduces a new, efficient message logging technique, call...
Software distributed shared memory (DSM) improves the programmability of message-passing machines and workclusters by providing a shared memory abstract (i.e., a coherent global a...
In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software ...
As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, w...
This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of avail...