Sciweavers

1113 search results - page 3 / 223
» Performance under Failures of DAG-based Parallel Computing
Sort
View
TC
2008
13 years 5 months ago
Adaptive Fault Management of Parallel Applications for High-Performance Computing
As the scale of high-performance computing (HPC) continues to grow, failure resilience of parallel applications becomes crucial. In this paper, we present FT-Pro, an adaptive fault...
Zhiling Lan, Yawei Li
ICDCS
2006
IEEE
13 years 11 months ago
Load Unbalancing to Improve Performance under Autocorrelated Traffic
Qi Zhang, Ningfang Mi, Alma Riska, Evgenia Smirni
IPPS
2006
IEEE
13 years 11 months ago
Load balancing in the presence of random node failure and recovery
In many distributed computing systems that are prone to either induced or spontaneous node failures, the number of available computing resources is dynamically changing in a rando...
Sagar Dhakal, Majeed M. Hayat, Jorge E. Pezoa, Cha...
IPPS
1999
IEEE
13 years 9 months ago
Condition-Based Maintenance: Algorithms and Applications for Embedded High Performance Computing
Condition based maintenance (CBM) seeks to generate a design for a new ship wide CMB system that performs diagnoses and failure prediction on Navy shipboard machinery. Eventually, ...
Bonnie Holte Bennett, George D. Hadden
IPPS
2005
IEEE
13 years 11 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...