Sciweavers

321 search results - page 10 / 65
» A Fault-Tolerance Protocol for Parallel Applications with Co...
Sort
View
CCGRID
2006
IEEE
15 years 4 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
ICDCS
1996
IEEE
15 years 2 months ago
An Evaluation of the Amoeba Group Communication System
The Amoeba group communication system has two unique aspects: (1) it uses a sequencer-based protocol with negative acknowledgements for achieving a total order on all group messag...
M. Frans Kaashoek, Andrew S. Tanenbaum
TC
1998
14 years 9 months ago
A Primary-Backup Channel Approach to Dependable Real-Time Communication in Multihop Networks
—Many applications require communication services with guaranteed timeliness and fault tolerance at an acceptable level of overhead. We present a scheme for restoring real-time c...
Seungjae Han, Kang G. Shin
JAVA
2001
Springer
15 years 2 months ago
A scalable, robust network for parallel computing
CX, a network-based computational exchange, is presented. The system’s design integrates variations of ideas from other researchers, such as work stealing, non-blocking tasks, e...
Peter R. Cappello, Dimitros Mourloukos
OTM
2009
Springer
15 years 4 months ago
Evaluating Throughput Stability of Protocols for Distributed Middleware
Communication of large data volumes is a core functionality of distributed systems middleware, namely, for interconnecting components, for distributed computation and for fault tol...
Nuno Carvalho, José P. Oliveira, José...