Sciweavers

AAECC
2007
Springer

When cache blocking of sparse matrix vector multiply works and why

13 years 3 months ago
When cache blocking of sparse matrix vector multiply works and why
Abstract. We present new performance models and a new, more compact data structure for cache blocking when applied to the sparse matrixvector multiply (SpM×V) operation, y ← y + A · x. Prior work indicates that cache blocked SpM×V performs very well for some matrix and machine combinations, yielding speedups as high as 3x. We look at the general question of when and why performance improves, finding that cache blocking is most effective when simultaneously 1) x does not fit in cache, 2) y fits in cache, 3) the non-zeros are distributed throughout the matrix, and 4) the non-zero density is sufficiently high. We extend our prior performance models, which bounded performance by assuming x and y fit in cache, to consider these classes of matrices. Unlike our prior model, the updated models are accurate enough to use as a heuristic for predicting the optimum block sizes. We conclude with architectural suggestions that would make processor and memory systems more amenable to SpM×V...
Rajesh Nishtala, Richard W. Vuduc, James Demmel, K
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2007
Where AAECC
Authors Rajesh Nishtala, Richard W. Vuduc, James Demmel, Katherine A. Yelick
Comments (0)