Sciweavers

5 search results - page 1 / 1
» Hybrid Checkpointing for MPI Jobs in HPC Environments
Sort
View
ICPADS
2010
IEEE
13 years 3 months ago
Hybrid Checkpointing for MPI Jobs in HPC Environments
As the core count in high-performance computing systems keeps increasing, faults are becoming common place. Checkpointing addresses such faults but captures full process images ev...
Chao Wang, Frank Mueller, Christian Engelmann, Ste...
HPDC
2009
IEEE
13 years 11 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
ICDCS
2012
IEEE
11 years 7 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
CCGRID
2006
IEEE
13 years 11 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
GRID
2004
Springer
13 years 10 months ago
Hybrid Preemptive Scheduling of MPI Applications on the Grids
— Time sharing between all the users of a Grid is a major issue in cluster and Grid integration. Classical Grid architecture involves a higher level scheduler which submits non o...
Aurelien Bouteiller, Hinde-Lilia Bouziane, Thomas ...