Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we presen...
This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...
In recent years, overlay networks have become an effective alternative to IP multicast for efficient point to multipoint communication across the Internet. Typically, nodes self-...
Dejan Kostic, Adolfo Rodriguez, Jeannie R. Albrech...
Idle desktops have been successfully used to run sequential and master-slave task parallel codes on a large scale in the context of volunteer computing. However, execution of messa...
— Recent advances have demonstrated the potential benefits of coordinated management of thermal load in data centers, including reduced cooling costs and improved resistance to ...
Justin D. Moore, Jeffrey S. Chase, Parthasarathy R...