: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
The SHETRAN physically-based distributed rainfall-runoff modelling system gives detailed simulations in time and space of water flow and sediment and solute transport in river cat...
In this paper, a task parallel application is implemented with Ninf-G which is a GridRPC system, and experimented on, using the Grid testbed in Asia Pacific, for three months. The...
Tailorability is generally regarded as a key property of groupware systems due to the dynamics and differentiation of cooperative work. This article investigates the use of softwa...
We develop a widely applicable algorithm to solve the fault diagnosis problem in certain distributed-memory multiprocessor systems in which there are a limited number of faulty pr...