Sciweavers

ISPASS
2010
IEEE

Weak execution ordering - exploiting iterative methods on many-core GPUs

13 years 2 months ago
Weak execution ordering - exploiting iterative methods on many-core GPUs
Abstract--On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When finegranularity of data communication and synchronization is required for large-scale parallel programs executed by multiple thread blocks, frequent host synchronization are necessary, and they incur a significant overhead. In this paper, we investigate a class of applications which uses a chaotic version of iterative methods [5], [22] to obtain numerical solutions for partial differential equations (PDE). Such a fast PDE solver is parallelized on GPUs with multiple thread blocks. In this parallel implementation, although frequent data communication is needed between adjacent thread blocks, a precise order of the data communication is not necessary. Separate communication threads are used for periodically exchanging the boundary values with adjacent thread blocks through the global memory. Since a precise order of the data communication is not required, the computation and...
Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir,
Added 13 Feb 2011
Updated 13 Feb 2011
Type Journal
Year 2010
Where ISPASS
Authors Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir, Jeff Ho, Lu Peng
Comments (0)