We propose increasing the survivability of data stored in two-dimensional RAID arrays by letting these arrays reorganize themselves whenever they detect a disk failure. This reorg...
We propose a new algorithm for recovering asynchronously from failures in a distributed computation. Our algorithm is based on two novel concepts - a fault-tolerant vector clock t...
Failure detectors are a service that provides (approximate) information about process crashes in a distributed system. The well-known “eventually perfect” failure detector, 3P...
A recent study characterizing failures in computer networks shows that transient single element (node/link) failures are the dominant failures in large communication networks like...
‘ We take advantage of the hierarchical structure of the star graph network to obtain an efficient method for constructing node-disjoint paths between arbitrary pairs of nodes in...