Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

15 years 11 months ago

Download www.cs.indiana.edu

A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by • time, • TLB misses, • L1 misses, • L2 misses, • paging to disk, • scaling on distributed processors, and • portability to multiple platforms. It provides a philosophy and tools that allow the programmer to deal with the memory hierarchy invisibly, from L1 and L2 to TLB, paging, and interprocessor communication. Used together, they provide a cacheoblivious style of programming. Plots are presented to support these claims on an implementation of Cholesky factorization crafted directly from the paradigm in C with a few intrinsic calls. The results in this paper focus on low-level performance, including the new Morton-hybrid representation to take advantage of hardware and compiler optimizations. In particular, this code beats Intel’s Matrix Kernel Library and matches AMD’s Core Math Library, losing a bit on L1 misses while winning decisivel...

Michael D. Adams, David S. Wise

Real-time Traffic

ACM SIGPLAN Wkshp | ACMMSP 2006 | Block-recursive Matrix Algorithms | Hardware | TLB Misses |

claim paper

Post Info
More Details (n/a)

Added	13 Jun 2010
Updated	13 Jun 2010
Type	Conference
Year	2006
Where	ACMMSP
Authors	Michael D. Adams, David S. Wise

Comments (0)

Sciweavers

Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

ACM SIGPLAN Wkshp | ACMMSP 2006 | Block-recursive Matrix Algorithms | Hardware | TLB Misses |

Explore & Download

Productivity Tools

Sciweavers