Utilizing the nonuniform latencies of SDRAM devices, access reordering mechanisms alter the sequence of main memory access streams to reduce the observed access latency. Using a r...
Branch misprediction penalty consists of two components: the time wasted on mis-speculative execution until the mispredicted branch is resolved and the time to restart the pipelin...
Amit Gandhi, Haitham Akkary, Srikanth T. Srinivasa...
—This paper introduces the microarchitecture and logical implementation of SMT (Simultaneous Multithreading) improvement of Godson-2 processor which is a 64-bit, four-issue, out-...
Existing schemes for cache energy optimization incorporate a limited degree of dynamic associativity: either direct mapped or full available associativity (say 4-way). In this pap...
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multilevel tiled code is essential for maximizing da...
Albert Hartono, Muthu Manikandan Baskaran, C&eacut...