Sciweavers

321 search results - page 11 / 65
» A Fault-Tolerance Protocol for Parallel Applications with Co...
Sort
View
IPPS
2007
IEEE
15 years 4 months ago
DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems
In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....
Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...
JPDC
2006
111views more  JPDC 2006»
14 years 10 months ago
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an...
Guojing Cong, David A. Bader
GPC
2007
Springer
15 years 4 months ago
Fault Management in P2P-MPI
We present in this paper the recent developments done in P2P-MPI, a grid middleware, concerning the fault management, which covers fault-tolerance for applications and fault detect...
Stéphane Genaud, Choopan Rattanapoka
IPPS
2006
IEEE
15 years 4 months ago
Coordinated checkpoint from message payload in pessimistic sender-based message logging
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category tec...
M. Aminian, Mohammad K. Akbari, Bahman Javadi
PVM
2009
Springer
15 years 4 months ago
VolpexMPI: An MPI Library for Execution of Parallel Applications on Volatile Nodes
The objective of this research is to convert ordinary idle PCs into virtual clusters for executing parallel applications. The paper introduces VolpexMPI that is designed to enable ...
Troy LeBlanc, Rakhi Anand, Edgar Gabriel, Jaspal S...