Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the outof-ord...
With faster CPU clocks and wider pipelines, all relevant microarchitecture components should scale accordingly. There have been many proposals for scaling the issue queue, registe...
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...