With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
The Amoeba group communication system has two unique aspects: (1) it uses a sequencer-based protocol with negative acknowledgements for achieving a total order on all group messag...
—Many applications require communication services with guaranteed timeliness and fault tolerance at an acceptable level of overhead. We present a scheme for restoring real-time c...
CX, a network-based computational exchange, is presented. The system’s design integrates variations of ideas from other researchers, such as work stealing, non-blocking tasks, e...
Communication of large data volumes is a core functionality of distributed systems middleware, namely, for interconnecting components, for distributed computation and for fault tol...