Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
The automatic construction of classi ers programs able to correctly classify data collected from the real world is one of the major problems in pattern recognition and in a wide ar...
Cosimo Anglano, Attilio Giordana, Giuseppe Lo Bell...
A major hurdle facing data intensive grid applications is the appropriate handling of failures that occur in the grid-environment. Implementing the fault-tolerance transparently a...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal compute...
Paul S. LeMahieu, Vasken Bohossian, Jehoshua Bruck