Sciweavers

321 search results - page 4 / 65
» A Fault-Tolerance Protocol for Parallel Applications with Co...
Sort
View
SAC
2008
ACM
13 years 5 months ago
Providing dependability for web services
Web services have been widely employed to allow interoperability among applications and/or technologies. However, the standard technologies and protocols which provide the foundat...
Jeferson L. R. Souza, Frank Siqueira
FGCS
2008
140views more  FGCS 2008»
13 years 6 months ago
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols
A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant progr...
Darius Buntinas, Camille Coti, Thomas Hérau...
CCGRID
2008
IEEE
13 years 8 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin
CLUSTER
2002
IEEE
13 years 11 months ago
Design and Validation of Portable Communication Infrastructure for Fault-Tolerant Cluster Middleware
We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...
CLUSTER
2004
IEEE
13 years 10 months ago
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé