Optimization of linked list prefix computations on multithreaded GPUs using CUDA

13 years 2 months ago

Download www.umiacs.umd.edu

We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures involve in general highly irregular fine grain memory accesses that are typical of many computations on linked lists, trees, and graphs. While the current generation of GPUs provides substantial computational power and extremely high bandwidth memory accesses, they may appear at first to be primarily geared toward streamed, highly data parallel computations. In this paper, we introduce an optimized multithreaded GPU algorithm for prefix computations through a randomization process that reduces the problem to a large number of fine-grain computations. We map these fine-grain computations onto multithreaded GPUs in such a way that the processing cost per element is shown to be close to the best possible. Our experimental results show scalability for list sizes ranging from 1M nodes to 256M nodes, and significan...

Zheng Wei, Joseph JáJá

Real-time Traffic

Computation | Distributed And Parallel Computing | Fine-grain Computations | IPPS 2010 | Prefix Computations |

claim paper

Added	13 Feb 2011
Updated	13 Feb 2011
Type	Journal
Year	2010
Where	IPPS
Authors	Zheng Wei, Joseph JáJá

Sciweavers

Optimization of linked list prefix computations on multithreaded GPUs using CUDA

Computation | Distributed And Parallel Computing | Fine-grain Computations | IPPS 2010 | Prefix Computations |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers