Weak execution ordering - exploiting iterative methods on many-core GPUs

13 years 2 months ago

Download www.cise.ufl.edu

Abstract--On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When finegranularity of data communication and synchronization is required for large-scale parallel programs executed by multiple thread blocks, frequent host synchronization are necessary, and they incur a significant overhead. In this paper, we investigate a class of applications which uses a chaotic version of iterative methods [5], [22] to obtain numerical solutions for partial differential equations (PDE). Such a fast PDE solver is parallelized on GPUs with multiple thread blocks. In this parallel implementation, although frequent data communication is needed between adjacent thread blocks, a precise order of the data communication is not necessary. Separate communication threads are used for periodically exchanging the boundary values with adjacent thread blocks through the global memory. Since a precise order of the data communication is not required, the computation and...

Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir,

Real-time Traffic

Data Communication | ISPASS 2010 | Multiple Thread Blocks | Software Engineering | Thread Blocks |

claim paper

Post Info
More Details (n/a)

Added	13 Feb 2011
Updated	13 Feb 2011
Type	Journal
Year	2010
Where	ISPASS
Authors	Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir, Jeff Ho, Lu Peng

Comments (0)

Sciweavers

Weak execution ordering - exploiting iterative methods on many-core GPUs

Data Communication | ISPASS 2010 | Multiple Thread Blocks | Software Engineering | Thread Blocks |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers