—In this work we study the portfolio problem which is to find a good combination of multiple heuristics to solve given instances on parallel resources in minimum time. The resou...
In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that su...
Abstract--With General Purpose programmable GPUs becoming more and more popular, automated tools are needed to bridge the gap between achievable performance from highly parallel ar...
Supercomputer architects strive to maximize the performance of scientific applications. Unfortunately, the large, unwieldy nature of most scientific applications has lead to the...
Richard C. Murphy, Arun Rodrigues, Peter M. Kogge,...
FPGA-based computing engines have become a promising option for the implementation of computationally intensive applications due to high flexibility and parallelism. However, one...
Qiang Liu, George A. Constantinides, Konstantinos ...