The cache complexity of multithreaded cache oblivious algorithms

9 years 5 months ago
The cache complexity of multithreaded cache oblivious algorithms
We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk workstealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs O(n3 / √ Z +(Pn)1/3 n2 ) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations incurring O(n2 /Z + n + √ Pn3+ǫ) cache misses with high probability, one for Gaussian elimination and back substitution, and one for the length computation part of the longest common subsequence problem incurring O n2 /Z + √ Pn3.58 cache misses with high probability.
Matteo Frigo, Volker Strumpen
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where SPAA
Authors Matteo Frigo, Volker Strumpen
Comments (0)