To achieve high-performance on multicore systems, sharedmemory parallel languages must efficiently implement atomic operations. The commonly used and studied paradigms for atomici...
One can e ectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential bene ts of predicated execution are hig...
Scott A. Mahlke, Richard E. Hank, James E. McCormi...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
In order to extract high levels of performance from modern parallel architectures, the effective management of deep memory hierarchies is very important. While architectural advan...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
The problem attacked in this paper is one of automatically mapping an application onto a Network-on-Chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fas...
Guangyu Chen, Feihui Li, Seung Woo Son, Mahmut T. ...