Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has b...
Suppose that a program makes a sequence of m accesses (references) to data blocks, the cache can hold k < m blocks, an access to a block in the cache incurs one time unit, and ...
Clustered microarchitectures are an effective approach to reducing the penalties caused by wire delays inside a chip. Current superscalar processors have in fact a two-cluster mic...
Ramon Canal, Joan-Manuel Parcerisa, Antonio Gonz&a...
This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines ...
John Bircsak, Peter Craig, RaeLyn Crowell, Zarka C...
We present a method for creating texture over an arbitrary surface mesh using an example 2D texture. The approach is to identify interesting regions (texture patches) in the 2D ex...