With faster CPU clocks and wider pipelines, all relevant microarchitecture components should scale accordingly. There have been many proposals for scaling the issue queue, registe...
High-performance processors use a large set–associative L1 data cache with multiple ports. As clock speeds and size increase such a cache consumes a significant percentage of t...
Dan Nicolaescu, Alexander V. Veidenbaum, Alexandru...
The use of large instruction windows coupled with aggressive out-oforder and prefetching capabilities has provided significant improvements in processor performance. In this paper...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the outof-ord...