Current processor allocation techniques for highly parallel systems are based on centralized front-end based algorithms. As a result, the applied strategies are restricted to stat...
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
Exception handling is one of the popular means used for improving dependability and supporting recovery in the ServiceOriented Architecture (SOA). This practical experience paper ...
Anatoliy Gorbenko, Alexander Romanovsky, Vyachesla...
This paper studies non-cryptographic authenticated broadcast in radio networks subject to malicious failures. We introduce two protocols that address this problem. The first, Nei...
Dan Alistarh, Seth Gilbert, Rachid Guerraoui, Zark...
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...