In this paper, we present a method to estimate the number of reconfiguration steps that a time-constrained algorithm can accommodate. This analysis demonstrates how one would attac...
Fully-populated tori, where every node has a processor attached, do not scale well since load on edges increases superlinearly with network size under heavy communication, resulti...
Abstract. The current prototype of the Genoa Active Message MAchine GAMMA is a low-overhead, Active Messages-based inter-process communication layer implemented mainly at kernel le...
Loop unrolling is one of the most promising parallelization techniques, because the nature of programs causes most of the processing time to be spent in their loops. Unrolling not...
Previous research has used program transformation to introduce parallelism and to exploit data locality. Unfortunately,these twoobjectives have usuallybeen considered independentl...