It is well recognized that LRU cache-line replacement can be ineffective for applications with large working sets or non-localized memory access patterns. Specifically, in lastle...
Caches are very inefficiently utilized because not all the excess data fetched into the cache, to exploit spatial locality, is utilized. We define cache utilization as the percent...
The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache’s high impact on system performance and power, and because of th...
Clustered microarchitectures are an effective organization to deal with the problem of wire delays and complexity by partitioning some of the processor resources. The organization ...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...