As distributed storage systems grow, the response time between detection and repair of the error becomes significant. Systems built on shared servers have additional complexity be...
Justin M. Wozniak, Paul Brenner, Douglas Thain, Aa...
In grid computing systems, providing fault-tolerance is required for both scientific computation and file-sharing to increase their reliability. In previous works, several mechani...
Sangho Yi, Derrick Kondo, Bongjae Kim, Geunyoung P...
Parallel developments are becoming increasingly prevalent in the building and evolution of large-scale software systems. Our previous studies of a large industrial project showed ...
provides a very brief overview of some of the main points. References are given to my papers, where those points are explained in more detail, and citations are provided to the ext...
A common approach to adding self-management capabilities to a system is to provide one or more external control modules, whose responsibility is to monitor system behavior, and ad...
Shang-Wen Cheng, An-Cheng Huang, David Garlan, Bra...