Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
—We present and analyze two new communication libraries, cudaMPI and glMPI, that provide an MPI-like message passing interface to communicate data stored on the graphics cards of...
In this paper we report on features added to a parallel debugger to simplify the debugging of message passing programs. These features include replay, setting consistent breakpoin...
In this paper, we describe a rule-based message passing method to support developing collaborative applications, in which multiple users share resources in distributed environments...
This paper defines the basic notions of local and non-local tasks, and determines the minimum information about failures that is necessary to solve any non-local task in message-p...
Carole Delporte-Gallet, Hugues Fauconnier, Sam Tou...