Sciweavers

CGO
2004
IEEE
13 years 8 months ago
SYZYGY - A Framework for Scalable Cross-Module IPO
Performing analysis across module boundaries for an entire program is important for exploiting several runtime performance opportunities. However, due to scalability problems in e...
Sungdo Moon, Xinliang D. Li, Robert Hundt, Dhruva ...
CGO
2004
IEEE
13 years 8 months ago
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture
Chi-Keung Luk, Robert Muth, Harish Patil, Robert S...
CGO
2004
IEEE
13 years 8 months ago
A Dynamically Tuned Sorting Library
Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that del...
Xiaoming Li, María Jesús Garzar&aacu...
CGO
2004
IEEE
13 years 8 months ago
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths
Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing ...
Manjunath Kudlur, Kevin Fan, Michael L. Chu, Rajiv...
CGO
2004
IEEE
13 years 8 months ago
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Pre-execution techniques have received much attention as an effective way of prefetching cache blocks to tolerate the everincreasing memory latency. A number of pre-execution tech...
Dongkeun Kim, Shih-Wei Liao, Perry H. Wang, Juan d...
CGO
2004
IEEE
13 years 8 months ago
Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems
In this paper, we present a technique for reducing the overhead of collecting path profiles in the context of a dynamic optimizer. The key idea to our approach, called Targeted Pa...
Rahul Joshi, Michael D. Bond, Craig B. Zilles
CGO
2004
IEEE
13 years 8 months ago
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads
Efficient inter-thread value communication is essential for improving performance in Thread-Level Speculation (TLS). Although several mechanisms for improving value communication ...
Antonia Zhai, Christopher B. Colohan, J. Gregory S...
CGO
2004
IEEE
13 years 8 months ago
Using Dynamic Binary Translation to Fuse Dependent Instructions
Instruction scheduling hardware can be simplified and easily pipelined if pairs of dependent instructions are fused so they share a single instruction scheduling slot. We study an...
Shiliang Hu, James E. Smith
CGO
2004
IEEE
13 years 8 months ago
Exploring Code Cache Eviction Granularities in Dynamic Optimization Systems
Dynamic optimization systems store optimized or translated code in a software-managed code cache in order to maximize reuse of transformed code. Code caches store superblocks that...
Kim M. Hazelwood, James E. Smith