Sciweavers

63 search results - page 3 / 13
» Adaptive incremental checkpointing for massively parallel sy...
Sort
View
IPPS
1996
IEEE
13 years 10 months ago
CoCheck: Checkpointing and Process Migration for MPI
Checkpointing of parallel applications can be used as the core technology to provide process migration. Both, checkpointing and migration, are an important issue for parallel appl...
Georg Stellner
IPPS
1999
IEEE
13 years 10 months ago
An Efficient Logging Algorithm for Incremental Replay of Message
To support incremental replay of message-passing applications, processes must periodically checkpoint and the content of some messages must be logged, to break dependencies of the...
Franco Zambonelli
ICDCS
2000
IEEE
13 years 10 months ago
Coherence-based Coordinated Checkpointing for Software Distributed Shared Memory Systems
Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel compu...
Angkul Kongmunvattana, Santipong Tanchatchawal, Ni...
LCPC
2009
Springer
13 years 10 months ago
A Communication Framework for Fault-Tolerant Parallel Execution
PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availabili...
Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel, Es...
IPPS
2009
IEEE
14 years 20 days ago
Scalability challenges for massively parallel AMR applications
PDE solvers using Adaptive Mesh Refinement on block structured grids are some of the most challenging applications to adapt to massively parallel computing environments. We descr...
Brian van Straalen, John Shalf, Terry J. Ligocki, ...