This paper proposes and evaluates error detection and recovery mechanisms suitable for embedded systems. The purpose of these mechanisms is to provide detection of and recovery fr...
In 1986 Jim Gray published his landmark study of the causes of failures of Tandem systems and the techniques Tandem used to prevent such failures [6]. Seventeen years later, Inter...
David L. Oppenheimer, Archana Ganapathi, David A. ...
In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring appl...
George Candea, Emre Kiciman, Shinichi Kawamoto, Ar...
Learning from software failures is an essential step towards the development of more reliable software systems and processes. However, as more intricate software systems are devel...
Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While ma...
Asim Kadav, Matthew J. Renzelmann, Michael M. Swif...