Modern microprocessor designs continue to obtain impressive performance gains through increasing clock rates and advances in the parallelism obtained via micro-architecture design...
When integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be ...
Reuse distance (i.e. LRU stack distance) precisely characterizes program locality and has been a basic tool for memory system research since the 1970s. However, the high cost of m...
Xipeng Shen, Jonathan Shaw, Brian Meeker, Chen Din...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
In conventional superscalar microarchitectures with partitioned integer and floating-point resources, all floating-point resources are idle during execution of integer programs....
S. Subramanya Sastry, Subbarao Palacharla, James E...