Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which...
Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibuls...
While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
This paper explores the design space of MMU caches that accelerate virtual-to-physical address translation in processor architectures, such as x86-64, that use a radix tree page t...
The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse line...
Low-latency remote-write networks, such as DEC’s Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of...
Robert Stets, Sandhya Dwarkadas, Nikos Hardavellas...