Short vector SIMD instructions on recent microprocessors, such as SSE on Pentium III and 4, speed up code but are a major challenge to software developers. We present a compiler t...
In this paper we describe techniques for compiling finegrained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Pr...
John A. Stratton, Vinod Grover, Jaydeep Marathe, B...
Exploiting parallelism at both the multiprocessor level and the instruction level is an e ective means for supercomputers to achieve high-performance. The amount of instruction-le...
Scott A. Mahlke, William Y. Chen, John C. Gyllenha...
Whenever large homogeneous data structures need to be processed in a non-trivial way, e.g. in computational sciences, image processing, or system simulation, high-level array prog...
Abstract. Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of l...