Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a writeinvalidate protocol result in invalidation actions that could be elimin...
With applications becoming larger and the increasing load on high performance systems, it is important to tackle the I/O bottleneck problem from several angles. It is not only ess...
Murali Vilayannur, Mahmut T. Kandemir, Anand Sivas...
The fill unit is the structure which collects blocks of instructions and combines them into multi-block segments for storage in a trace cache. In this paper, we expand the role of...
Abstract—This paper proposes a new software-oriented approach for managing the distributed shared L2 caches of a chip multiprocessor (CMP) for latency-oriented multithreaded appl...
Dynamic binary translation systems enable a wide range of applications such as program instrumentation, optimization, and security. DBTs use a software code cache to store previou...