This paper shows that, in an environment where we do not bound the number of faulty processes, the class P of Perfect failure detectors is the weakest (among realistic failure det...
An important aspect in the development of dependable software is to decide where to locate mechanisms for efficient error detection and recovery. We present a comparison between ...
We present ideas on how to structure software systems for high availability by considering MTTR/MTTF characteristics of components in addition to the traditional criteria, such as...
George Candea, James Cutler, Armando Fox, Rushabh ...
The NASA Remote Exploration and Experimentation (REE) Project, managed by the Jet Propulsion Laboratory, has the vision of bringing commercial supercomputing technology into space...
Traditional problem determination techniques rely on static dependency models that are difficult to generate accurately in today’s large, distributed, and dynamic application e...
Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armand...