Nested data-parallelism on the gpu

11 years 2 months ago
Nested data-parallelism on the gpu
Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. NESL is a first-order functional language that was designed to allow programmers to write irregular-parallel programs — such as parallel divide-and-conquer algorithms — for wide-vector parallel computers. This paper presents our port of the NESL implementation to work on GPUs and provides empirical evidence that nested data-parallelism (NDP) on GPUs significantly outperforms CPUbased implementations and matches or beats newer GPU languages that support only flat parallelism. While our performance does not match that of hand-tuned CUDA programs, we argue that the notational conciseness of NESL is worth th...
Lars Bergstrom, John H. Reppy
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where ICFP
Authors Lars Bergstrom, John H. Reppy
Comments (0)