Computer systems often fail due to many factors such as software bugs or administrator errors. Diagnosing such production run failures is an important but challenging task since i...
Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuan...
Self-healing relies on correct diagnosis of system malfunctioning. This paper presents a use-case based approach to self-diagnosis. Both a static and a dynamic model of a managed-s...
A. Reza Haydarlou, Benno J. Overeinder, Michel A. ...
Failures in plan execution can be attributed to errors in the execution of plan steps or violations of the plan structure. The structure of a plan prescribes which actions have to...
Cees Witteveen, Nico Roos, Adriaan ter Mors, Xiaoy...
—The problem of finding efficient workload distribution techniques is becoming increasingly important today for heterogeneous distributed systems where the availability of comp...
Vladimir Shestak, Edwin K. P. Chong, Anthony A. Ma...
—In distributed computing systems (DCSs) where server nodes can fail permanently with nonzero probability, the system performance can be assessed by means of the service reliabil...