Checkpointing is a commonly used approach to provide fault-tolerance and improve system dependability. However, using a constant and preconfigured checkpointing frequency may comp...
End-to-end consensus ensures delivery of the same value to the application layer running in distributed processes. Deliveries that have not been acknowledged by the application be...
Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational “vertices” with communication ...
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrel...
Redundant Arrays of Independent Components (RAIC) is a technology that uses groups of similar or identical distributed components to provide dependable services [1,2,3]. RAIC allo...
: Performing dependability evaluation along with other analyses at architectural level allows both making architectural tradeoffs and predicting the effects of architectural decisi...
Ana-Elena Rugina, Peter H. Feiler, Karama Kanoun, ...