For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and latency. The early knowledge of the data cache bank that an instruction will access ...
Stefan Bieschewski, Joan-Manuel Parcerisa, Antonio...
As technological process shrinks and clock rate increases, instruction caches can no longer be accessed in one cycle. Alternatives are implementing smaller caches (with higher mis...
Ever increasing performance requirements have elevated deeply pipelined architectures to a standard even in the embedded processor domain, requiring the incorporation of dynamic b...
—Performance degradation of memory-intensive programs caused by the LRU policy’s inability to handle weaklocality data accesses in the last level cache is increasingly serious ...