Model-driven autotuning of sparse matrix-vector multiply on GPUs

16 years 5 months ago

Download vuduc.org

We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully hand-tuned SpMV implementations for GPUs, identifying key GPU-specific performance limitations, enhancements, and tuning opportunities. These implementations, which include variants on classical blocked compressed sparse row (BCSR) and blocked ELLPACK (BELLPACK) storage formats, match or exceed state-of-the-art implementations. For instance, our best BELLPACK implementation achieves up to 29.0 Gflop/s in single-precision and 15.7 Gflop/s in doubleprecision on the NVIDIA T10P multiprocessor (C1060), enhancing prior state-of-the-art unblocked implementations (Bell and

Jee W. Choi, Amik Singh, Richard W. Vuduc

Real-time Traffic

Parallel Computing | Performance Model-driven Framework | PPOPP 2010 | State-of-the-art Implementations | State-of-the-art Unblocked Implementations |

claim paper

Post Info
More Details (n/a)

Added	05 Mar 2010
Updated	08 Mar 2010
Type	Conference
Year	2010
Where	PPOPP
Authors	Jee W. Choi, Amik Singh, Richard W. Vuduc

Comments (0)

Sciweavers

Model-driven autotuning of sparse matrix-vector multiply on GPUs

Parallel Computing | Performance Model-driven Framework | PPOPP 2010 | State-of-the-art Implementations | State-of-the-art Unblocked Implementations |

Explore & Download

Productivity Tools

Sciweavers