This paper describes a fast, automated technique for accurate on-line estimation of the performance and power consumption of interacting processes in a multi-programmed, multi-cor...
Xi Chen, Chi Xu, Robert P. Dick, Zhuoqing Morley M...
Snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We propos...
—Divisible loads are those workloads that can be partitioned by a scheduler into any arbitrary chunks. The problem of scheduling divisible loads has been defined for a long time,...
—The contribution of memory latency to execution time continues to increase, and latency hiding mechanisms become ever more important for efficient processor design. While high-...
—The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow me...