Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

193

ICS
1997
Tsinghua U.

117views Distributed And Parallel Com...» more ICS 1997»

Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology

15 years 11 months ago

Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology

Download www.icsi.berkeley.edu

Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machine-speci c hand tuning. We have developed a methodology whereby near-peak performance on a wide range of systems can be achieved automatically for such routines. First, by analyzing current machines and C compilers, we've developed guidelines for writing Portable, High-Performance, ANSI C PHiPAC, pronounced fee-pack". Second, rather than code by hand, we produce parameterized code generators. Third, we write search scripts that nd the best parameters for a given system. We report on a BLAS GEMM compatible multi-level cache-blocked matrix multiply generator which produces code that achieves around 90 of peak on the Sparcstation-20 61, IBM RS 6000-590, HP 712 80i, SGI Power Challenge R8k, and SGI Octane R10k, and over 80 of peak on the SGI Indigo R4k. The resulting routines are competitive with vendoroptimized BLAS GEMMs. CS Division, University of Cali...

Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, James

Real-time Traffic

ARPA Contract | Distributed And Parallel Computing | ICS 1997 | International Computer Science Institute | Tennessee Subcontract |

claim paper

Post Info
More Details (n/a)

Added	08 Aug 2010
Updated	08 Aug 2010
Type	Conference
Year	1997
Where	ICS
Authors	Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, James Demmel

Comments (0)