Sciweavers

SRDS
1999
IEEE

Logging and Recovery in Adaptive Software Distributed Shared Memory Systems

13 years 8 months ago
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems
Software distributed shared memory (DSM) improves the programmability of message-passing machines and workclusters by providing a shared memory abstract (i.e., a coherent global address space) to programmers. As in any distributed system, however, the probability of software DSM failures increases as the system size grows. This paper presents a new, efficient logging protocol for adaptive software DSM (ADSM), called adaptive logging (AL). It is suitable for both coordinated and independent checkpointing since it speeds up the recovery process and eliminates the unbounded rollback problem associated with independent checkpointing. By leveraging the existing coherence data maintained by ADSM, our AL protocol adapts to log only unrecoverable data (which cannot be recreated or retrieved after a failure) necessary for correct recovery, reducing both the number of messages logged and the amount of logged data. We have performed experiments on a cluster of eight Sun Ultra-5 workstations, com...
Angkul Kongmunvattana, Nian-Feng Tzeng
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1999
Where SRDS
Authors Angkul Kongmunvattana, Nian-Feng Tzeng
Comments (0)