Sciweavers

Share
ICS
2010
Tsinghua U.

Cache oblivious parallelograms in iterative stencil computations

9 years 1 months ago
Cache oblivious parallelograms in iterative stencil computations
We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains with up to 128 million double precision elements for constant and variable stencils against hand-optimized naive code and the automatic polyhedral parallelizer and locality optimizer PluTo and demonstrate the clear superiority of our results. The performance benefits stem from a tiling structure that caters for data locality, parallelism and vectorization simultaneously. Rather than tiling the iteration space from inside, we take an exterior approach with a pre-defined hierarchy, simple regular parallelogram tiles and a locality preserving parallelization. These advantages come at the cost of an irregular work-load distribution but a tightly integrated load-balancer ensures a high utilization of all resources.
Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Ha
Added 19 Jul 2010
Updated 19 Jul 2010
Type Conference
Year 2010
Where ICS
Authors Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel
Comments (0)
books