Sciweavers

SRDS
1996
IEEE

Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System

13 years 8 months ago
Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System
Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance object-oriented parallel processing system that is based on an extension of the data-flow model. The functional nature of data-flow enables both parallelism and faulttolerance. In this paper, we exploit the data-flow underpinning of Mentat to provide easy-to-use and transparent fault-tolerance. We present results on both a small-scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Supercomputing Applications, the University of Virginia and the NASA Langley Research Center.
Anh Nguyen-Tuong, Andrew S. Grimshaw, Mark Hyett
Added 07 Aug 2010
Updated 07 Aug 2010
Type Conference
Year 1996
Where SRDS
Authors Anh Nguyen-Tuong, Andrew S. Grimshaw, Mark Hyett
Comments (0)