Sciweavers

SRDS
2008
IEEE

Probabilistic Failure Detection for Efficient Distributed Storage Maintenance

13 years 10 months ago
Probabilistic Failure Detection for Efficient Distributed Storage Maintenance
Distributed storage systems often use data replication to mask failures and guarantee high data availability. Node failures can be transient or permanent. While the system must generate new replicas to replace replica lost to permanent failures, it can save significant replication costs by not replicating following transient faults. Given the unpredictability of network dynamics, however, distinguishing permanent and transient failures is extremely difficult. Traditional timeout approaches are difficult to tune and can introduce unnecessary replication.1 In this paper, we propose Protector, an algorithm that addresses this problem using network-wide statistical prediction. Our algorithm drastically improves prediction accuracy by making predictions across aggregate replica groups instead of single nodes. These estimates of the number of "live replicas" can guide efficient data replication policies. We prove that given data on node down times and the probability of permanent ...
Jing Tian, Zhi Yang, Wei Chen, Ben Y. Zhao, Yafei
Added 01 Jun 2010
Updated 01 Jun 2010
Type Conference
Year 2008
Where SRDS
Authors Jing Tian, Zhi Yang, Wei Chen, Ben Y. Zhao, Yafei Dai
Comments (0)