Sciweavers

4 search results - page 1 / 1
» CoCheck: Checkpointing and Process Migration for MPI
Sort
View
IPPS
1996
IEEE
13 years 7 months ago
CoCheck: Checkpointing and Process Migration for MPI
Checkpointing of parallel applications can be used as the core technology to provide process migration. Both, checkpointing and migration, are an important issue for parallel appl...
Georg Stellner
CCGRID
2007
IEEE
13 years 10 months ago
Dynamic Malleability in Iterative MPI Applications
Malleability enables a parallel application’s execution system to split or merge processes modifying granularity. While process migration is widely used to adapt applications to...
Kaoutar El Maghraoui, Travis J. Desell, Boleslaw K...
CCGRID
2011
IEEE
12 years 7 months ago
High Performance Pipelined Process Migration with RDMA
—Coordinated Checkpoint/Restart (C/R) is a widely deployed strategy to achieve fault-tolerance. However, C/R by itself is not capable enough to meet the demands of upcoming exasc...
Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavi...
ICS
2007
Tsinghua U.
13 years 9 months ago
Proactive fault tolerance for HPC with Xen virtualization
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...