Sciweavers

25 search results - page 1 / 5
» Performance portable GPU code generation for matrix multipli...
Sort
View
ASPLOS
2011
ACM
12 years 8 months ago
Sponge: portable stream programming on graphics engines
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL...
Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor N. ...
ICS
1997
Tsinghua U.
13 years 8 months ago
Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology
Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machine-speci c hand tuning. We have developed a methodology wh...
Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, James...
EUROPAR
1998
Springer
13 years 8 months ago
Parallel Sparse Matrix Computations Using the PINEAPL Library: A Performance Study
Abstract. The Numerical Algorithms Group Ltd is currently participating in the European HPCN Fourth Framework project on Parallel Industrial NumErical Applications and Portable Lib...
Arnold R. Krommer
HIPC
2009
Springer
13 years 2 months ago
A performance prediction model for the CUDA GPGPU platform
The significant growth in computational power of modern Graphics Processing Units(GPUs) coupled with the advent of general purpose programming environments like NVIDA's CUDA,...
Kishore Kothapalli, Rishabh Mukherjee, M. Suhail R...
ICCS
2009
Springer
13 years 11 months ago
Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes
The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to ...
Boyana Norris, Albert Hartono, Elizabeth R. Jessup...