Checkpointing of parallel applications can be used as the core technology to provide process migration. Both, checkpointing and migration, are an important issue for parallel appl...
To support incremental replay of message-passing applications, processes must periodically checkpoint and the content of some messages must be logged, to break dependencies of the...
Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel compu...
PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availabili...
Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel, Es...
PDE solvers using Adaptive Mesh Refinement on block structured grids are some of the most challenging applications to adapt to massively parallel computing environments. We descr...
Brian van Straalen, John Shalf, Terry J. Ligocki, ...