Search Sciweavers | Sciweavers

14 search results - page 1 / 3

» Optimization for performance and energy for batched matrix c...

click to vote

CCGRID
2011
IEEE

256views Distributed And Parallel Com...» more CCGRID 2011»

Small Discrete Fourier Transforms on GPUs

12 years 8 months ago

Download www.cs.fsu.edu

– Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data ...

S. Mitra, A. Srinivasan

claim paper

Read More »

click to vote

ICCS
2009
Springer

191views Applied Computing» more ICCS 2009»

A Note on Auto-tuning GEMM for GPUs

13 years 11 months ago

Download www.netlib.org

The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is espe...

Yinan Li, Jack Dongarra, Stanimire Tomov

claim paper

Read More »

click to vote

ICPR
2008
IEEE

147views Computer Vision» more ICPR 2008»

Incremental clustering via nonnegative matrix factorization

13 years 11 months ago

Download figment.cse.usf.edu

Nonnegative matrix factorization (NMF) has been shown to be an efficient clustering tool. However, NMF`s batch nature necessitates recomputation of whole basis set for new samples...

Serhat Selcuk Bucak, Bilge Günsel

claim paper

Read More »

click to vote

IPPS
2007
IEEE

112views Distributed And Parallel Com...» more IPPS 2007»

Memory Optimizations For Fast Power-Aware Sparse Computations

13 years 11 months ago

Download www.cecs.uci.edu

— We consider memory subsystem optimizations for improving the performance of sparse scientiﬁc computation while reducing the power consumed by the CPU and memory. We ﬁrst co...

Konrad Malkowski, Padma Raghavan, Mary Jane Irwin

claim paper

Read More »

click to vote

ICS
2010
Tsinghua U.

214views Distributed And Parallel Com...» more ICS 2010»

Large-scale FFT on GPU clusters

13 years 9 months ago

Download sei.pku.edu.cn

A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e.g. matrix multiplication and LINPACK) and bandwidth-i...

Yifeng Chen, Xiang Cui, Hong Mei

claim paper

Read More »

« Prev « First page 1 / 3 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers