Sciweavers

482 search results - page 1 / 97
» A large-scale study of failures in high-performance computin...
Sort
View
ICPP
2008
IEEE
13 years 11 months ago
Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study
Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fa...
Jiexing Gu, Ziming Zheng, Zhiling Lan, John White,...
WEA
2005
Springer
176views Algorithms» more  WEA 2005»
13 years 10 months ago
High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology
Abstract. Many large-scale optimization problems rely on graph theoretic solutions; yet high-performance computing has traditionally focused on regular applications with high degre...
David A. Bader
HIPC
2000
Springer
13 years 8 months ago
Meta-data Management System for High-Performance Large-Scale Scientific Data Access
Many scientific applications manipulate large amount of data and, therefore, are parallelized on high-performance computing systems to take advantage of their computational power a...
Wei-keng Liao, Xiaohui Shen, Alok N. Choudhary
DSN
2006
IEEE
13 years 11 months ago
Improving BGP Convergence Delay for Large-Scale Failures
Border Gateway Protocol (BGP) is the standard routing protocol used in the Internet for routing packets between the Autonomous Systems (ASes). It is known that BGP can take hundre...
Amit Sahoo, Krishna Kant, Prasant Mohapatra
IPPS
2005
IEEE
13 years 10 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...