Tsinghua U.

Large-scale FFT on GPU clusters

9 years 1 months ago
Large-scale FFT on GPU clusters
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e.g. matrix multiplication and LINPACK) and bandwidth-intensive tasks with data locality (e.g. finite-difference simulation). Bandwidth-intensive tasks such as large-scale FFTs without data locality are harder to accelerate, as the bottleneck often lies with the PCI between main memory and GPU device memory or the communication network between workstation nodes. That means optimizing the performance of FFT for a single GPU device will not improve the overall performance. This paper uses large-scale FFT as an example to show how to achieve substantial speedups for these more challenging tasks on a GPU cluster. Three GPU-related factors lead to better performance: firstly the use of GPU devices improves the sustained memory bandwidth for processing large-size data; secondly GPU device memory allows larger subtasks to be processed in whole and hence reduces repea...
Yifeng Chen, Xiang Cui, Hong Mei
Added 19 Jul 2010
Updated 19 Jul 2010
Type Conference
Year 2010
Where ICS
Authors Yifeng Chen, Xiang Cui, Hong Mei
Comments (0)