Sciweavers

SC
2015
ACM

A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems

8 years 18 days ago
A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems
The increasing data demands from high-performance computing applications significantly accelerate the capacity, capability and reliability requirements of storage systems. As systems scale, component failures and repair times increase, significantly impacting data availability. A wide array of decision points must be balanced in designing such systems. We propose a systematic approach that balances and optimizes both initial and continuous spare provisioning based on a detailed investigation of the anatomy and field failure data analysis of extreme-scale storage systems. We consider the component failure characteristics and its cost and impact at the system level simultaneously. We build a tool to evaluate different provisioning schemes, and the results demonstrate that our optimized provisioning can reduce the duration of data unavailability by as much as 52% under a fixed budget. We also observe that non-disk components have much higher failure rates than disks, and warrant car...
Added 17 Apr 2016
Updated 17 Apr 2016
Type Journal
Year 2015
Where SC
Comments (0)