Sciweavers

22 search results - page 2 / 5
» Fault Tolerance in Message Passing and in Action
Sort
View
CCGRID
2008
IEEE
13 years 7 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin
CLUSTER
2004
IEEE
13 years 9 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
ICDCS
1998
IEEE
13 years 9 months ago
Low-Overhead Protocols for Fault-Tolerant File Sharing
In this paper, we quantify the adverse effect of file sharing on the performance of reliable distributed applications. We demonstrate that file sharing incurs significant overhead...
Lorenzo Alvisi, Sriram Rao, Harrick M. Vin
FGCS
2008
140views more  FGCS 2008»
13 years 5 months ago
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols
A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant progr...
Darius Buntinas, Camille Coti, Thomas Hérau...
CCS
2009
ACM
14 years 6 months ago
Unconditionally secure message transmission in arbitrary directed synchronous networks tolerating generalized mixed adversary
In this paper, we re-visit the problem of unconditionally secure message transmission (USMT) from a sender S to a receiver R, who are part of a distributed synchronous network, mo...
Kannan Srinathan, Arpita Patra, Ashish Choudhary, ...