Helper threading is a technique that utilizes a second core or logical processor in a multi-threaded system to improve the performance of the main thread. A helper thread executes...
Three different partial differential equation (PDE) solver kernels are analyzed in respect to cache memory performance on a simulated shared memory computer. The kernels implement...
Partition Scheduling with Prefetching (PSP) is a memory latency hiding technique which combines the loop pipelining technique with data prefetching. In PSP, the iteration space is...
This paper presents techniques for compiling loops with complex, indirect array accesses into loops whose array references have at most one level of indirection. The transformatio...
Abstract. Technological advances and increasingly complex and dynamic application behavior argue for revisiting mechanisms that adapt logical cache block size to application charac...
Matthew A. Watkins, Sally A. McKee, Lambert Schael...