Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. The local instruction scheduling problem is to nd a m...
We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operation...
Andreas Moshovos, Dionisios N. Pnevmatikatos, Amir...
We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stre...
Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia
Load-reuse analysis finds instructions that repeatedly access the same memory location. This location can be promoted to a register, eliminating redundant loads by reusing the re...
Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data...