Sciweavers

DICS
2006

Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules: A Programmer's Perspective

13 years 8 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules: A Programmer's Perspective
Dynamic Parallel Schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automatic pipelined parallel execution of applications. DPS supports graceful degradation of parallel applications in case of node failures. The fault-tolerance mechanism relies on a set of backup threads stored in the volatile storage of alternate nodes that are kept up to date by both duplicating transmitted data objects and performing periodical checkpointing. The current state of a failed node can be reconstructed on its backup threads by re-executing the application since the last checkpoint. A valid execution order is automatically deduced from the flow graph. The addition of fault-tolerance to a DPS application requires only minor changes to the application's source code. The present contribution focuses on the development of fault-tolerant parallel applications with DPS from a programmer's perspec...
Sebastian Gerlach, Basile Schaeli, Roger D. Hersch
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where DICS
Authors Sebastian Gerlach, Basile Schaeli, Roger D. Hersch
Comments (0)