Sciweavers

2 search results - page 1 / 1
» DCR: A fully transparent checkpoint restart framework for di...
Sort
View
IPPS
2007
IEEE
13 years 11 months ago
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...