Sciweavers

14 search results - page 1 / 3
» Optimization for performance and energy for batched matrix c...
Sort
View
CCGRID
2011
IEEE
12 years 8 months ago
Small Discrete Fourier Transforms on GPUs
– Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data ...
S. Mitra, A. Srinivasan
ICCS
2009
Springer
13 years 11 months ago
A Note on Auto-tuning GEMM for GPUs
The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is espe...
Yinan Li, Jack Dongarra, Stanimire Tomov
ICPR
2008
IEEE
13 years 11 months ago
Incremental clustering via nonnegative matrix factorization
Nonnegative matrix factorization (NMF) has been shown to be an efficient clustering tool. However, NMF`s batch nature necessitates recomputation of whole basis set for new samples...
Serhat Selcuk Bucak, Bilge Günsel
IPPS
2007
IEEE
13 years 11 months ago
Memory Optimizations For Fast Power-Aware Sparse Computations
— We consider memory subsystem optimizations for improving the performance of sparse scientific computation while reducing the power consumed by the CPU and memory. We first co...
Konrad Malkowski, Padma Raghavan, Mary Jane Irwin
ICS
2010
Tsinghua U.
13 years 9 months ago
Large-scale FFT on GPU clusters
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e.g. matrix multiplication and LINPACK) and bandwidth-i...
Yifeng Chen, Xiang Cui, Hong Mei