Optimizations performed at link time or directly applied to nal program executables have received increased attention in recent years. This paper discuss the discovery and elimina...
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...
Virtual multicasting (VMC) combines some of the benefits of caching (transparency, dynamic adaptation to workload) and multicasting (reducing duplicated traffic). Virtual multicas...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...
This work presents and evaluates a novel processor microarchitecture which combines two paradigms: access/ execute decoupling and simultaneous multithreading. We investigate how b...