– Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data ...
The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is espe...
Nonnegative matrix factorization (NMF) has been shown to be an efficient clustering tool. However, NMF`s batch nature necessitates recomputation of whole basis set for new samples...
— We consider memory subsystem optimizations for improving the performance of sparse scientific computation while reducing the power consumed by the CPU and memory. We first co...
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e.g. matrix multiplication and LINPACK) and bandwidth-i...