Sciweavers

25 search results - page 1 / 5
» Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Para...
Sort
View
SRDS
1996
IEEE
13 years 9 months ago
Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System
Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will ...
Anh Nguyen-Tuong, Andrew S. Grimshaw, Mark Hyett
ICPP
2000
IEEE
13 years 8 months ago
A Problem-Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems
The idle computers on a local area, campus area, or even wide area network represent a significant computational resource--one that is, however, also unreliable, heterogeneous, an...
Adriana Iamnitchi, Ian T. Foster
PVM
2007
Springer
13 years 11 months ago
Using CMT in SCTP-Based MPI to Exploit Multiple Interfaces in Cluster Nodes
Many existing clusters use inexpensive Gigabit Ethernet and often have multiple interfaces cards to improve bandwidth and enhance fault tolerance. We investigate the use of Concurr...
Brad Penoff, Mike Tsai, Janardhan R. Iyengar, Alan...
IPPS
2005
IEEE
13 years 10 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
DSN
2007
IEEE
13 years 11 months ago
Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance
Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point towards multi-threaded multi-core designs,...
Alex Shye, Tipp Moseley, Vijay Janapa Reddi, Josep...