RENO is a modified MIPS R10000 register renamer that uses map-table “short-circuiting” to implement dynamic versions of several well-known static optimizations: move eliminat...
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
The advent of multicores presents a promising opportunity for exploiting fine grained parallelism present in programs. Programs parallelized in the above fashion, typically involv...
This paper introduces dynamic object colocation, an optimization to reduce copying costs in generational and other incremental garbage collectors by allocating connected objects t...