Allowing loads to issue out-of-order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Blindl...
For maximum performance, an out-of-order processor must issue load instructions as early as possible, while avoiding memory-order violations with prior store instructions that wri...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
Memory operations remain a significant bottleneck in dynamically scheduled pipelined processors, due in part to the inability to statically determine the existence of memory addr...
Glenn Reinman, Brad Calder, Dean M. Tullsen, Gary ...
This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculat...