Hard-to-predict branches depending on longlatency cache-misses have been recognized as a major performance obstacle for modern microprocessors. With the widening speed gap between...
Hongliang Gao, Yi Ma, Martin Dimitrov, Huiyang Zho...
Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip tha...
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, S...
Abstract—This paper proposes a new software-oriented approach for managing the distributed shared L2 caches of a chip multiprocessor (CMP) for latency-oriented multithreaded appl...
The Distributed Shared Memory (DSM) model is designed to leverage the ease of programming of the shared memory paradigm, while enabling the highperformance by expressing locality ...
ABSTRACT Motivated by data value locality and quality tolerance present in multimedia applications, we propose a new micro-architecture, Region-level Approximate Computation Buffer...