Sciweavers

86 search results - page 3 / 18
» Hybrid checkpointing for parallel applications in cluster fe...
Sort
View
PVM
2005
Springer
13 years 10 months ago
New User-Guided and ckpt-Based Checkpointing Libraries for Parallel MPI Applications
We present design and implementation details as well as performance results for two new parallel checkpointing libraries developed by us for parallel MPI applications. The first o...
Pawel Czarnul, Marcin Fraczak
ICPP
2009
IEEE
13 years 12 months ago
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems
—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...
Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...
GRID
2004
Springer
13 years 10 months ago
Hybrid Preemptive Scheduling of MPI Applications on the Grids
— Time sharing between all the users of a Grid is a major issue in cluster and Grid integration. Classical Grid architecture involves a higher level scheduler which submits non o...
Aurelien Bouteiller, Hinde-Lilia Bouziane, Thomas ...
CLUSTER
2003
IEEE
13 years 10 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
IPPS
2005
IEEE
13 years 11 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...