Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Heterogeneous multiprocessors are emerging as the dominant implementation approach to embedded multiprocessor systems. In addition to having processing elements suited to the targ...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...