Sciweavers

ACTA
2005

Optimal recovery schemes in fault tolerant distributed computing

13 years 4 months ago
Optimal recovery schemes in fault tolerant distributed computing
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all n computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down, the load on these computers must be redistributed to other computers in the system. The redistribution is determined by the recovery scheme. The recovery scheme is governed by a sequence of integers modulo n. Each sequence guarantees minimal load on the computer that has maximal load even when the most unfavorable combinations of computers go down. We calculate the best possible such recovery schemes for any number of crashed computers by an exhaustive search, where brute force testing is avoided by a mathematical reformulation of the problem and a branch-and-bound algorithm. The search nevertheless has a high complexity. Optimal sequences, and thus a corresponding optimal bound, are presented for a maximum of twenty one computers in the distr...
Kamilla Klonowska, Håkan Lennerstad, Lars Lu
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2005
Where ACTA
Authors Kamilla Klonowska, Håkan Lennerstad, Lars Lundberg, Charlie Svahnberg
Comments (0)