Given a set of n different deterministic finite state machines (DFSMs) modeling a distributed system, we examine the problem of tolerating f crash or Byzantine faults in such a ...
Vinit A. Ogale, Bharath Balasubramanian, Vijay K. ...
TPT-RAID is a multi-box RAID wherein each ECC group comprises at most one block Jrom any given storage box, and can thus tolerate a boxJailure. It extends the idea ojan out-oj-ban...
Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To th...
Andrea Bondavalli, Silvano Chiaradonna, Felicita D...
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...