Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program tran...
The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transf...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Multi-core processors are proliferated across different domains in recent years. In this paper, we study the performance of frequent pattern mining on a modern multi-core machine....
Many important applications, such as those using sparse data structures, have memory reference patterns that are unknown at compile-time. Prior work has developed runtime reorderi...
Michelle Mills Strout, Larry Carter, Jeanne Ferran...