The excessive complexity of both machine architectures and applications have made it difficult for compilers to statically model and predict application behavior. This observatio...
Qing Yi, Keith Seymour, Haihang You, Richard W. Vu...
With the increasing gap between processor speed and memory latency, the performance of data-dominated programs are becoming more reliant on fast data access, which can be improved...
Execution-driven simulators are often used for power/energy and performance evaluation. Simulators can provide semantic details but they provide insufficient speed and accuracy f...
Detecting and predicting a program’s execution phases are crucial to dynamic optimizations and dynamically adaptable systems. This paper shows that a phase can be associated with...
Jinpyo Kim, Sreekumar V. Kodakara, Wei-Chung Hsu, ...
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...