Fault-tolerant programs are typically not only difficult to implement but also incur extra costs in terms of performance or resource consumption. Failures are typically relatively ...
Ilwoo Chang, Matti A. Hiltunen, Richard D. Schlich...
Testing methods are compared in a model where program failures are detected and the software changed to eliminate them. The question considered is whether it is better to use test...
Phyllis G. Frankl, Richard G. Hamlet, Bev Littlewo...
We adapt the classic cusum change-point detection algorithm for applications to data network monitoring where various and numerous performance and reliability metrics are availabl...
Failure detectors represent a very important building block in distributed applications. The speed and the accuracy of the failure detectors is critical to the performance of the ...
Failure resilience is one of the desired features of the Internet. Most of the traditional restoration architectures are based on single-failure assumption which is unrealistic. M...