As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety ...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...
The Network Weather Service NWS is a distributed resource monitoring and utilization prediction system, employed as an aid to scheduling jobs in a metacomputing environment 9, 1...
Robert E. Busby Jr., Mitchell L. Neilsen, Daniel A...
Abstract. We propose a prediction-based best-effort real-time service to support distributed, interactive applications in shared, unreserved computing environments. These applicati...
Peter A. Dinda, Loukas F. Kallivokas, Bruce Loweka...
Transactional memory promises to make parallel programming easier than with fine-grained locking, while performing just as well. This performance claim is not always borne out bec...
A new RAID-x (redundant array of inexpensive disks at level x) architecture is presented for distributed I/O processing on a serverless cluster of computers. The RAID-x architectu...