This paper proposes a variation of the Byzantine generals problem (or Byzantine consensus). Each general has a set of good plans and a set of bad plans. The problem is to make all...
Miguel Correia, Alysson Neves Bessani, Paulo Ver&i...
Formal methods can improve the development of systems with high quality requirements, since they usually o er a precise, nonambiguous speci cation language and allow rigorous veri ...
We initiate the study of error confinement in distributed applications, where the goal is that only nodes that were directly hit by a fault may deviate from their correct external...
This paper presents a new fault injection tool called Exhaustif (Exhaustive Workbench for Systems Reliability). Exhaustif is a SWIFI fault injection tool for fault tolerance verif...
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...