As processors continue to exploit more instruction level parallelism, a greater demand is placed on reducing the e ects of memory access latency. In this paper, we introduce a nov...
Despite their superior performance for multimedia applications, vector processors have three limitations that hinder their widespread acceptance. First, the complexity and size of...
Load latency remains a signi cant bottleneck in dynamically scheduled pipelined processors. Load speculation techniques have been proposed to reduce this latency. Dependence Predi...
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
Improving program locality has become increasingly important on modern computer systems. An effective strategy is to group computations on the same data so that once the data are ...