Thread-level speculation provides architectural support to aggressively run hard-to-analyze code in parallel. As speculative tasks run concurrently, they generate unsafe or specul...
In high-end processors, increasing the number of in-flight instructions can improve performance by overlapping useful processing with long-latency accesses to the main memory. Buf...
We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core processors. FB-DIMM has a unique two-level interconnect structure, with FB-DIM...
As the performance gap between processors and memory systems increases, the CPU spends more time stalled waiting for data from main memory. Critical long latency instructions, suc...
Nikil Mehta, Brian Singer, R. Iris Bahar, Michael ...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...