Performing analysis across module boundaries for an entire program is important for exploiting several runtime performance opportunities. However, due to scalability problems in e...
Sungdo Moon, Xinliang D. Li, Robert Hundt, Dhruva ...
Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that del...
Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing ...
Manjunath Kudlur, Kevin Fan, Michael L. Chu, Rajiv...
Pre-execution techniques have received much attention as an effective way of prefetching cache blocks to tolerate the everincreasing memory latency. A number of pre-execution tech...
Dongkeun Kim, Shih-Wei Liao, Perry H. Wang, Juan d...
In this paper, we present a technique for reducing the overhead of collecting path profiles in the context of a dynamic optimizer. The key idea to our approach, called Targeted Pa...
Efficient inter-thread value communication is essential for improving performance in Thread-Level Speculation (TLS). Although several mechanisms for improving value communication ...
Antonia Zhai, Christopher B. Colohan, J. Gregory S...
Instruction scheduling hardware can be simplified and easily pipelined if pairs of dependent instructions are fused so they share a single instruction scheduling slot. We study an...
Dynamic optimization systems store optimized or translated code in a software-managed code cache in order to maximize reuse of transformed code. Code caches store superblocks that...