Sciweavers

116 search results - page 1 / 24
» A Communication Framework for Fault-Tolerant Parallel Execut...
Sort
View
LCPC
2009
Springer
13 years 9 months ago
A Communication Framework for Fault-Tolerant Parallel Execution
PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availabili...
Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel, Es...
IPPS
2007
IEEE
13 years 11 months ago
Self Adaptive Application Level Fault Tolerance for Parallel and Distributed Computing
Most application level fault tolerance schemes in literature are non-adaptive in the sense that the fault tolerance schemes incorporated in applications are usually designed witho...
Zizhong Chen, Ming Yang, Guillermo A. Francia III,...
IPPS
2005
IEEE
13 years 10 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
ICPP
2000
IEEE
13 years 8 months ago
A Problem-Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems
The idle computers on a local area, campus area, or even wide area network represent a significant computational resource--one that is, however, also unreliable, heterogeneous, an...
Adriana Iamnitchi, Ian T. Foster
DICS
2006
13 years 8 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules: A Programmer's Perspective
Dynamic Parallel Schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automa...
Sebastian Gerlach, Basile Schaeli, Roger D. Hersch