There is a growing need for systems that can monitor and analyze application performance data automatically in order to deliver reliable and sustained performance to applications....
Lingyun Yang, Jennifer M. Schopf, Catalin Dumitres...
Despite using high-speed network interconnection systems like InfiniBand, the communication overhead for parallel applications is still high. In this paper we show, how such costs...
Robert Rex, Frank Mietke, Wolfgang Rehm, Christoph...
Abstract--For the management of a virtual P2P supercomputer one is interested in subgroups of processors that can communicate with each other efficiently. The task of finding these...
Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with large number of register file ports. E...
Viktor S. Lapinskii, Margarida F. Jacome, Gustavo ...
As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...