Sciweavers

HPDC
1999
IEEE

Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations

13 years 8 months ago
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations
This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in being efficient, fault-tolerant, highly available, and dynamic as a system internally, and in supporting fault-tolerance and dynamicity for its application programs as well. Starfish achieves these goals by combining group communication technology with checkpoint/restart, and uses a novel architecture that is both flexible and portable and keeps group communication outside the critical data path, for maximum performance.
Adnan Agbaria, Roy Friedman
Added 03 Aug 2010
Updated 03 Aug 2010
Type Conference
Year 1999
Where HPDC
Authors Adnan Agbaria, Roy Friedman
Comments (0)