Conventional load/store queues (LSQs) are an impediment to both power-efficient execution in superscalar processors and scaling to large-window designs. In this paper, we propose...
Simha Sethumadhavan, Franziska Roesner, Joel S. Em...
The relative decline of single-threaded processor performance, coupled with the ongoing shift towards on chip parallelism requires that CAD applications run efficiently on paralle...
Graphics cards exercise increasingly more computing power and are highly optimized for high data transfer volumes. In contrast typical workstations perform badly when data exceeds...
Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, th...
Concurrent and incremental collectors require barriers to ensure correct synchronisation between mutator and collector. The overheads imposed by particular barriers on particular ...
Laurence Hellyer, Richard Jones, Antony L. Hosking