This paper attempts to identify one of the necessary conditions for self-healing, or self-repair, in complex systems, and to propose means for satisfying this condition in heterog...
Web services and service-oriented architecture (SOA) have become the de facto standard for designing distributed and loosely coupled applications. Many servicebased applications de...
Harald Psaier, Florian Skopik, Daniel Schall, Scha...
Hardware failures in autonomous and distributed software systems create the need for self-healing activities. This work addresses the problem of redeploying software components af...
The number of processors embedded on high performance computing platforms is continuously increasing to accommodate user desire to solve larger and more complex problems. However,...
Thara Angskun, George Bosilca, Graham E. Fagg, Jel...
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...