Sciweavers

354 search results - page 2 / 71
» Self Adaptive Application Level Fault Tolerance for Parallel...
Sort
View
IPPS
2008
IEEE
13 years 10 months ago
Enhancing application robustness through adaptive fault tolerance
As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adapt...
Zhiling Lan, Yawei Li, Ziming Zheng, Prashasta Guj...
DSN
2007
IEEE
13 years 10 months ago
Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance
Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point towards multi-threaded multi-core designs,...
Alex Shye, Tipp Moseley, Vijay Janapa Reddi, Josep...
IPPS
1999
IEEE
13 years 8 months ago
An Adaptive, Fault-Tolerant Implementation of BSP for JAVA-Based Volunteer Computing Systems
Abstract. In recent years, there has been a surge of interest in Javabased volunteer computing systems, which aim to make it possible to build very large parallel computing network...
Luis F. G. Sarmenta
ICPP
2000
IEEE
13 years 8 months ago
A Fault-Tolerant Adaptive and Minimal Routing Approach in n-D Meshes
In this paper a sufficient condition is given for minimal routing in n-dimensional (n-D) meshes with faulty nodes contained in a set of disjoint fault regions. It is based on an ...
Jie Wu