Sciweavers

25 search results - page 2 / 5
» Performance portable GPU code generation for matrix multipli...
Sort
View
PLDI
2012
ACM
11 years 7 months ago
Adaptive input-aware compilation for graphics engines
While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
CCGRID
2011
IEEE
12 years 8 months ago
Small Discrete Fourier Transforms on GPUs
– Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data ...
S. Mitra, A. Srinivasan
PPOPP
2012
ACM
12 years 12 days ago
PARRAY: a unifying array representation for heterogeneous parallelism
This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU c...
Yifeng Chen, Xiang Cui, Hong Mei
EGH
2010
Springer
13 years 2 months ago
AnySL: efficient and portable shading for ray tracing
While a number of different shading languages have been developed, their efficient integration into an existing renderer is notoriously difficult, often boiling down to implementi...
Ralf Karrenberg, Dmitri Rubinstein, Philipp Slusal...
ICCS
2009
Springer
13 years 11 months ago
A Note on Auto-tuning GEMM for GPUs
The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is espe...
Yinan Li, Jack Dongarra, Stanimire Tomov