This paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for e...
Efficiently utilizing off-chip DRAM bandwidth is a critical issue in designing cost-effective, high-performance chip multiprocessors (CMPs). Conventional memory controllers deli...
— High-end computing (HEC) systems have passed the petaflop barrier and continue to move toward the next frontier of exascale computing. As companies and research institutes con...
Narayan Desai, Darius Buntinas, Daniel Buettner, P...
Prior research indicates that there is much spatial variation in applications' memory access patterns. Modern memory systems, however, use small fixed-size cache blocks and a...
Stephen Somogyi, Thomas F. Wenisch, Anastassia Ail...
The presented image registration method uses a regularized gradient flow to correlate the intensities in two images. Thereby, an energy functional is successively minimized by des...