Sciweavers

GI
2004
Springer

Crash Management for Distributed Parallel Systems

13 years 9 months ago
Crash Management for Distributed Parallel Systems
: With the growing complexity of parallel architectures, the probability of system failures grows, too. One approach to cope with this problem is the self-healing, one of the organic computing’s self-x features. Self-healing in this context means that computer clusters should detect and handle failures automatically. This paper presents a self-healing mechanism based on checkpointing, so that a cluster remains operative even if some sites or the connections between them fail. The proposed method has been implemented and tested on the Self Distributing Virtual Machine (SDVM).
Jan Haase, Frank Eschmann
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where GI
Authors Jan Haase, Frank Eschmann
Comments (0)