Abstract—This paper proposes a new software-oriented approach for managing the distributed shared L2 caches of a chip multiprocessor (CMP) for latency-oriented multithreaded appl...
Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiproce...
Xudong Shi, Zhen Yang, Jih-Kwon Peir, Lu Peng, Yen...
Both commercial and scientific workloads benefit from concurrency and exhibit data sharing across threads/processes. The resulting sharing patterns are often fine-grain, with t...
Hemayet Hossain, Sandhya Dwarkadas, Michael C. Hua...
This paper presents a helper thread prefetching scheme that is designed to work on loosely-coupled processors, such as in a standard chip multi-processor (CMP) system and in an in...
Changhee Jung, Daeseob Lim, Jaejin Lee, Yan Solihi...
Snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We propos...