Grid applications need to be fault tolerant, malleable, and migratable. In previous work, we have presented orphan saving, an efficient mechanism addressing these issues for divide...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolera...