Sciweavers

Share
CLUSTER
2002
IEEE

Scalable Resource Management in High Performance Computers

10 years 6 months ago
Scalable Resource Management in High Performance Computers
Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highlyavailable computers. Although many hardware solutions are available today, the largest challenge in making largescale clusters usable lies in the system software. In this paper we present STORM, a resource management tool designed to provide scalability, low overhead, and the flexibility necessary to efficiently support and analyze a wide range of job-scheduling algorithms. STORM achieves these feats by using a small set of primitive mechanisms that are common in modern high-performance interconnects. The architecture of STORM is based on three main technical innovations. First, a part of the scheduler runs in the thread processor located on the network interface. Second, we use hardware collectives that are highly scalable both for implementing control heartbeats and to distribute the binary of a parallel job in near-constant time. Third, we use an I/O bypass protocol tha...
Eitan Frachtenberg, Fabrizio Petrini, Juan Fern&aa
Added 14 Jul 2010
Updated 14 Jul 2010
Type Conference
Year 2002
Where CLUSTER
Authors Eitan Frachtenberg, Fabrizio Petrini, Juan Fernández, Salvador Coll
Comments (0)
books