Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
The number of pipeline stages separating dynamic instruction scheduling from instruction execution has increased considerably in recent out-of-order microprocessor implementations...
Software pipelining and unfolding are commonly used techniques to increase parallelism for DSP applications. However, these techniques expand the code size of the application sign...
Bin Xiao, Zili Shao, Chantana Chantrapornchai, Edw...
Bus traffic between the graphics subsystem and memory can become a bottleneck when rendering geometrically complex meshes. In this paper, we investigate the use of vertex caching...
This paper aims to improve locality of references by suitably choosing array layouts. We use a new definition of spatial reuse vectors that takes into account memory layout of arra...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...