Untolerated load instruction latencies often have a significant impact on overall program performance. As one means of mitigating this effect, we present an aggressive hardware-b...
With faster CPU clocks and wider pipelines, all relevant microarchitecture components should scale accordingly. There have been many proposals for scaling the issue queue, registe...
This paper explores the concept of micro-architectural loops and discusses their impact on processor pipelines. In particular, we establish the relationship between loose loops an...
Eric Borch, Eric Tune, Srilatha Manne, Joel S. Eme...
Future high-performance billion-transistor processors are likely to employ partitioned architectures to achieve high clock speeds, high parallelism, low design complexity, and low...
3D integration technology greatly increases transistor density while providing faster on-chip communication. 3D implementations of processors can simultaneously provide both laten...