Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the ...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, ...
Abstract. Service Level Agreements (SLA) are needed to allow business interactions to rely on Internet services. Service Level Objectives (SLO) specify the committed performance le...
The High Performance Storage System (HPSS) provides scalable hierarchical storage management (HSM), archive, and file system services. Its design, implementation and current domin...
This paper describes TimeLine, an efficient archive service for a distributed storage system. TimeLine allows users to take snapshots on demand. The archive is stored online so th...
Over the past decade the number of processors in the high performance facilities went up to hundreds of thousands. As a direct consequence, while the computational power follow th...
Aurelien Bouteiller, George Bosilca, Jack Dongarra