Sciweavers

115 search results - page 3 / 23
» Transparent Fault Tolerance for Parallel Applications on Net...
Sort
View
IPPS
2007
IEEE
13 years 11 months ago
DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems
In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....
Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...
CCGRID
2008
IEEE
13 years 5 months ago
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed
IPPS
2000
IEEE
13 years 8 months ago
Fault Tolerant Wide-Area Parallel Computing
Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not c...
Jon B. Weissman
IPPS
2003
IEEE
13 years 10 months ago
A Low Cost Fault Tolerant Packet Routing for Parallel Computers
This work presents a new switching mechanism to tolerate arbitrary faults in interconnection networks with a negligible implementation cost. Although our routing technique can be ...
Valentin Puente, José A. Gregorio, Ram&oacu...
CLUSTER
2003
IEEE
13 years 10 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...