Sciweavers

22 search results - page 1 / 5
» Using replication and checkpointing for reliable task manage...
Sort
View
IEEEHPCS
2010
13 years 2 months ago
Using replication and checkpointing for reliable task management in computational Grids
In grid computing systems, providing fault-tolerance is required for both scientific computation and file-sharing to increase their reliability. In previous works, several mechani...
Sangho Yi, Derrick Kondo, Bongjae Kim, Geunyoung P...
ICCS
2007
Springer
13 years 11 months ago
Providing Fault-Tolerance in Unreliable Grid Systems Through Adaptive Checkpointing and Replication
Abstract. As grids typically consist of autonomously managed subsystems with strongly varying resources, fault-tolerance forms an important aspect of the scheduling process of appl...
Maria Chtepen, Filip H. A. Claeys, Bart Dhoedt, Fi...
GRID
2004
Springer
13 years 10 months ago
Checkpoint and Restart for Distributed Components in XCAT3
With the advent of Grid computing, more and more highend computational resources become available for use to a scientist. While this opens up new avenues for scientific research,...
Sriram Krishnan, Dennis Gannon
HPCC
2009
Springer
13 years 2 months ago
Graph-Based Task Replication for Workflow Applications
Abstract--The Grid is an heterogeneous and dynamic environment which enables distributed computation. This makes it a technology prone to failures. Some related work uses replicati...
Raúl Sirvent, Rosa M. Badia, Jesús L...
HPDC
2008
IEEE
13 years 11 months ago
Dynasa: adapting grid applications to safety using fault-tolerant methods
Grid applications have been prone to encountering problems such as failures or malicious attacks during execution, due to their distributed and large-scale features. The applicati...
Xuanhua Shi, Jean-Louis Pazat, Eric Rodriguez, Hai...