Sciweavers

129 search results - page 2 / 26
» Adaptation point analysis for computation migration checkpoi...
Sort
View
ICPP
1996
IEEE
13 years 9 months ago
MpPVM: A Software System for Non-Dedicated Heterogeneous Computing
This paper presents the design and preliminary implementation of MpPVM, a software system that supports process migration for PVM application programs in a non-dedicated heterogen...
Kasidit Chanchio, Xian-He Sun
IPPS
2008
IEEE
13 years 11 months ago
Enhancing application robustness through adaptive fault tolerance
As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adapt...
Zhiling Lan, Yawei Li, Ziming Zheng, Prashasta Guj...
IEEEHPCS
2010
13 years 3 months ago
Using replication and checkpointing for reliable task management in computational Grids
In grid computing systems, providing fault-tolerance is required for both scientific computation and file-sharing to increase their reliability. In previous works, several mechani...
Sangho Yi, Derrick Kondo, Bongjae Kim, Geunyoung P...
CCGRID
2006
IEEE
13 years 11 months ago
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
As the scale of cluster computing grows, it is becoming hard for long-running applications to complete without facing failures on large-scale clusters. To address this issue, chec...
Yawei Li, Zhiling Lan
CCGRID
2007
IEEE
13 years 11 months ago
Dynamic Malleability in Iterative MPI Applications
Malleability enables a parallel application’s execution system to split or merge processes modifying granularity. While process migration is widely used to adapt applications to...
Kaoutar El Maghraoui, Travis J. Desell, Boleslaw K...