In this paper we present an analytical-based framework for parallel program performance prediction. The main thrust of this work is to provide a means for treating realistic appli...
—In this paper, we analyze restrictions of traditional models affecting the accuracy of analytical prediction of the execution time of collective communication operations. In par...
Alexey L. Lastovetsky, Vladimir Rychkov, Maureen O...
The design of appropriate communication architectures for complex Systems-on-Chip (SoC) is a challenging task. One promising alternative to solve these problems are Networks-on-Ch...
Holger Blume, Thorsten von Sydow, Daniel Becker, T...
This paper is structured as follows. Section 2 gives an architectural description of BlueGene/L. Section 3 analyzes the issue of “computational noise” – the effect that the o...
Kei Davis, Adolfy Hoisie, Greg Johnson, Darren J. ...
This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculat...