Sciweavers

FAST
2010

Black-Box Problem Diagnosis in Parallel File Systems

13 years 6 months ago
Black-Box Problem Diagnosis in Parallel File Systems
We focus on automatically diagnosing different performance problems in parallel file systems by identifying, gathering and analyzing OS-level, black-box performance metrics on every node in the cluster. Our peercomparison diagnosis approach compares the statistical attributes of these metrics across I/O servers, to identify the faulty node. We develop a root-cause analysis procedure that further analyzes the affected metrics to pinpoint the faulty resource (storage or network), and demonstrate that this approach works commonly across stripe-based parallel file systems. We demonstrate our approach for realistic storage and network problems injected into three different file-system benchmarks (dd, IOzone, and PostMark), in both PVFS and Lustre clusters.
Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi, Priya
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2010
Where FAST
Authors Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi, Priya Narasimhan
Comments (0)