Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be incr...
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the outof-ord...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...
Silent store instructions write values that exactly match the values that are already stored at the memory address that is being written. A recent study reveals that significant ...
Memory models like SC, TSO, and PC enforce load-load ordering, requiring that loads from any single thread appear to occur in program order to all other threads. Out-of-order execu...