Tsinghua U.

High-performance CUDA kernel execution on FPGAs

11 years 8 months ago
High-performance CUDA kernel execution on FPGAs
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to efficiently map the exposed parallelism in CUDA kernels onto reconfigurable devices. The use of the CUDA programming model offers the advantage of a common programming interface for exploiting parallelism on two very different types of accelerators – FPGAs and GPUs. Moreover, by leveraging the advanced synthesis capabilities of AutoPilot we enable efficient exploitation of the FPGA configurability for application specific acceleration. Our flow is based on a compilation process that transforms the SPMD CUDA thread blocks into high-concurrency AutoPilot-C code. We provide an overview of our CUDA-toFPGA flow and demonstrate the highly competitive performance of the generated multi-core accelerators. Categories and Subject Descriptors D.3.3 [Computer Systems Organization]: Performance of Systems– design stu...
Alexandros Papakonstantinou, Karthik Gururaj, John
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where ICS
Authors Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-mei W. Hwu
Comments (0)