Sciweavers

IPPS
2009
IEEE

Minimizing startup costs for performance-critical threading

13 years 11 months ago
Minimizing startup costs for performance-critical threading
—Using the well-known ATLAS and LAPACK dense linear algebra libraries, we demonstrate that the parallel management overhead (PMO) can grow with problem size on even statically scheduled parallel programs with minimal task interaction. Therefore, the widely held view that these thread management issues can be ignored in such computationally intensive libraries is wrong, and leads to substantial slowdown on today’s machines. We survey several methods for reducing this overhead, the best of which we have not seen in the literature. Finally, we demonstrate that by applying these techniques at the kernel level, performance in applications such as LU and QR factorizations can be improved by almost 40% for small problems, and as much as 15% for large O(N3 ) computations. These techniques are completely general, and should yield significant speedup in almost any performance-critical operation. We then show that the lion’s share of the remaining parallel inefficiency comes from bus cont...
Anthony M. Castaldo, R. Clint Whaley
Added 24 May 2010
Updated 24 May 2010
Type Conference
Year 2009
Where IPPS
Authors Anthony M. Castaldo, R. Clint Whaley
Comments (0)