Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
The goal of this paper is to assess the value of simple features that are widely available in off-the-shelf CORBA and Java platforms for the implementation of faulttolerance mecha...
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire outp...
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M....
We study communication complexity of consensus in synchronous message-passing systems with processes prone to crashes. The goal in the consensus problem is to have all the nonfaul...
Bogdan S. Chlebus, Dariusz R. Kowalski, Michal Str...
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...