Performing experimental evaluation of fault tolerant distributed systems is a complex and tedious task, and automating as much as possible of the execution and evaluation of exper...
This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in ...
One of the mostsoughtaftersoftware innovation of thisdecade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and ...
: The distributed recovery block (DRB) scheme is a widely applicable approach for realizing both hardware and software fault tolerance in real-time distributed and parallel compute...
In this paper, a task parallel application is implemented with Ninf-G which is a GridRPC system, and experimented on, using the Grid testbed in Asia Pacific, for three months. The...