As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limiting program performance. Until now, the principal focus of hardware and softwar...
This paper presents a compiler methodology for memoryaware mapping on 2-Dimensional coarse-grained reconfigurable architectures that aims in improving the mapped applications’ p...
Grigoris Dimitroulakos, Michalis D. Galanis, Costa...
This paper introduces a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. ...
Rodric M. Rabbah, Hariharan Sandanagobalane, Mongk...
This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on object...
—The performance bottleneck for many scientific applications is the cost of memory access inside linear algebra kernels. Tuning such kernels for memory efficiency is a complex ...