Sciweavers

HPCA
2003
IEEE

Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters

14 years 4 months ago
Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters
A challenging issue in today's server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address this problem in shared virtual memory (SVM) clusters at ramming abstraction layer. We design extensions to an existing SVM protocol that has been tuned for lowlatency, high-bandwidth interconnects and SMP nodes and we achieve reliability through dynamic replication of application shared data and protocol information. Our extensions allow us to tolerate single (or multiple, but not simultaneous) node failures. We implement our extensions on a stateof-the-art cluster and we evaluate the common, failure-free case. We find that, although the complexity of our protocol is substantially higher than its failure-free counterpart, by taking advantage of architectural features of modern systems our approach imposes low overhead and can be employed for transparently dealing with system failures.
Rosalia Christodoulopoulou, Reza Azimi, Angelos Bi
Added 01 Dec 2009
Updated 01 Dec 2009
Type Conference
Year 2003
Where HPCA
Authors Rosalia Christodoulopoulou, Reza Azimi, Angelos Bilas
Comments (0)