Three protocols for gossip-based failure detection services in large-scale heterogeneous clusters are analyzed and compared. The basic gossip protocol provides a means by which fai...
We consider a system with a dispatcher and several identical servers in parallel. Task processing times are known upon arrival. We first study the impact of the local scheduling ...
This paper is concerned with the design of goal-oriented scheduling policies that deal with multiple goals on production parallel systems. Several objective models are compared, i...
In the facility location problem (FLP) we are given a set of facilities and a set of clients, each of which is to be served by one facility. The goal is to decide which subset of f...
We describe a methodology that enables the real-time diagnosis of performance problems in complex high-performance distributed systems. The methodology includes tools for generati...
Brian Tierney, William E. Johnston, Brian Crowley,...