Sciweavers

ICS
2009
Tsinghua U.

Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems

13 years 11 months ago
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi’s iterative method for the 2-D Poisson equation on a structured grid, in both single- and doubleprecision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060, and 78% on a C870. Motivated to find a still faster implementation, we further consider “wildly asynchronous” implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations. By doing so, we trade-off more flops, via more iterations to converge, for a higher degree of asynchronous parallelism. Our wild imple
Sundaresan Venkatasubramanian, Richard W. Vuduc
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where ICS
Authors Sundaresan Venkatasubramanian, Richard W. Vuduc
Comments (0)