To exploit increased instruction-level parallelism available in modern processors, we describe the formation and optimization of tracenets, an integrated approach to reducing the ...
Alexandre E. Eichenberger, Waleed Meleis, Suman Ma...
Virtual machine service threads can perform many tasks in parallel with program execution such as garbage collection, dynamic compilation, and profile collection and analysis. Har...
Dynamic Zero Compression reduces the energy required for cache accesses by only writing and reading a single bit for every zero-valued byte. This energy-conscious compression is i...
Conventional microarchitectures choose a single memory hierarchy design point targeted at the average application. In this paper, we propose a cache and TLB layout and design that...
Rajeev Balasubramonian, David H. Albonesi, Alper B...
In this work we show that value prediction can be used to avoid the penalty of long wire delays by predicting the data that is communicated through these long wires and validating...