A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. These faults do not cause permanent damage, but may result in incorr...
David Walker, Lester W. Mackey, Jay Ligatti, Georg...
Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of imple...
Secure, fault-tolerant distributed systems are difficult to build, to validate, and to operate. Conservative design for such systems dictates that their security and fault toleran...
In recent years, there have been a few proposals to add a small amount of trusted hardware at each replica in a Byzantine fault tolerant system to cut back replication factors. Th...
Allen Clement, Flavio Junqueira, Aniket Kate, Rodr...
A central problem in massively parallel computing is efficiently routing data between processors. This problem is complicated by two considerations. First, in any massively parall...