Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper prese...
RENO is a modified MIPS R10000 register renamer that uses map-table “short-circuiting” to implement dynamic versions of several well-known static optimizations: move eliminat...
Abstract. Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a pana...
Stencil computation (SC) is of critical importance for broad scientific and engineering applications. However, it is a challenge to optimize complex, highorder SC on emerging clus...
Liu Peng, Richard Seymour, Ken-ichi Nomura, Rajiv ...