Sciweavers

CCGRID
2009
IEEE

Performance under Failures of DAG-based Parallel Computing

13 years 11 months ago
Performance under Failures of DAG-based Parallel Computing
— As the scale and complexity of parallel systems continue to grow, failures become more and more an inevitable fact for solving large-scale applications. In this research, we present an analytical study to estimate execution time in the presence of failures of directed acyclic graph (DAG) based Scientific Applications and provide a guideline for performance optimization. The study is four fold. We first introduce a performance model to predict individual subtask computation time under failures. Next, a layered, iterative approach is adopted to transform a DAG into a layered DAG, which reflects full dependencies among all the subtasks. Then, the expected execution time under failures of the DAG is derived based on stochastic analysis. Unlike existing models, this newly proposed performance model provides both the variance and distribution. It is practical and can be put to real use. Finally, based on the model, performance optimization, weak point identification and enhancement a...
Hui Jin, Xian-He Sun, Ziming Zheng, Zhiling Lan, B
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where CCGRID
Authors Hui Jin, Xian-He Sun, Ziming Zheng, Zhiling Lan, Bing Xie
Comments (0)