Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
We revisit memory hierarchy design viewing memory as an inter-operation communication agent. This perspective leads to the development of novel methods of performing inter-operati...
Modern processors rely on memory dependence prediction to execute load instructions as early as possible, speculating that they are not dependent on an earlier, unissued store. To...
Franziska Roesner, Doug Burger, Stephen W. Keckler
This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculat...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the outof-ord...