Nested data-parallelism on the gpu

11 years 11 months ago

Download people.cs.uchicago.edu

Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. NESL is a ﬁrst-order functional language that was designed to allow programmers to write irregular-parallel programs — such as parallel divide-and-conquer algorithms — for wide-vector parallel computers. This paper presents our port of the NESL implementation to work on GPUs and provides empirical evidence that nested data-parallelism (NDP) on GPUs signiﬁcantly outperforms CPUbased implementations and matches or beats newer GPU languages that support only ﬂat parallelism. While our performance does not match that of hand-tuned CUDA programs, we argue that the notational conciseness of NESL is worth th...

Lars Bergstrom, John H. Reppy

Real-time Traffic