High-Performance clusters are rapidly becoming an important computing platform for both scientific and business applications. To fulfill the new demands and challenges, cluster sy...
Zhihong Zhang, Dan Meng, Jianfeng Zhan, Lei Wang, ...
In this paper, we present a structure for monitoring a large set of computational clusters. We illustrate methods for scaling a monitor network comprised of many clusters while ke...
Federico D. Sacerdoti, Mason J. Katz, Matthew L. M...
This paper describes an object-oriented software architecture for cluster integration and management that enables extensibility, portability, and scalability. This architecture ha...
James H. Laros III, Lee Ward, Nathan W. Dauchy, Ro...
We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...
Evaluating the design of a distributed application is di cult but provides useful information for program development and maintenance. In distributed debugging, for example, proce...