Sciweavers

4044 search results - page 601 / 809
» The Evolution of a Distributed Operating System
Sort
View
FAST
2007
15 years 4 months ago
Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You?
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we presen...
Bianca Schroeder, Garth A. Gibson
USENIX
1996
15 years 4 months ago
Transparent Fault Tolerance for Parallel Applications on Networks of Workstations
This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...
Daniel J. Scales, Monica S. Lam
129
Voted
SOSP
2003
ACM
16 years 8 days ago
Bullet: high bandwidth data dissemination using an overlay mesh
In recent years, overlay networks have become an effective alternative to IP multicast for efficient point to multipoint communication across the Internet. Typically, nodes self-...
Dejan Kostic, Adolfo Rodriguez, Jeannie R. Albrech...
138
Voted
CCGRID
2010
IEEE
15 years 4 months ago
A High-Level Interpreted MPI Library for Parallel Computing in Volunteer Environments
Idle desktops have been successfully used to run sequential and master-slave task parallel codes on a large scale in the context of volunteer computing. However, execution of messa...
Troy P. LeBlanc, Jaspal Subhlok, Edgar Gabriel
140
Voted
ICAC
2006
IEEE
15 years 9 months ago
Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers
— Recent advances have demonstrated the potential benefits of coordinated management of thermal load in data centers, including reduced cooling costs and improved resistance to ...
Justin D. Moore, Jeffrey S. Chase, Parthasarathy R...