Sciweavers

MSS
2003
IEEE

Reliability Mechanisms for Very Large Storage Systems

13 years 9 months ago
Reliability Mechanisms for Very Large Storage Systems
Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus on two types of errors: nonrecoverable read errors and drive failures. We discuss mechanisms for detecting and recovering from such errors, introducing improved techniques for detecting errors in disk reads and fast recovery from disk failure. We show that simple RAID cannot guarantee sufficient reliability; our analysis examines the tradeoffs among other schemes between system availability and storage efficiency. Based on our data, we believe that two-way mirroring should be sufficient for most large storage systems. For those that need very high reliability, we recommend either three-way mirroring or mirroring combined with RAID.
Qin Xin, Ethan L. Miller, Thomas J. E. Schwarz, Da
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Where MSS
Authors Qin Xin, Ethan L. Miller, Thomas J. E. Schwarz, Darrell D. E. Long, Scott A. Brandt, Witold Litwin
Comments (0)