Sciweavers

19 search results - page 4 / 4
» Automatic Tuning Matrix Multiplication Performance on Graphi...
Sort
View
ICCS
2009
Springer
13 years 12 months ago
A Note on Auto-tuning GEMM for GPUs
The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is espe...
Yinan Li, Jack Dongarra, Stanimire Tomov
FCCM
2007
IEEE
107views VLSI» more  FCCM 2007»
13 years 11 months ago
Optimizing Logarithmic Arithmetic on FPGAs
This paper proposes optimizations of the methods and parameters used in both mathematical approximation and hardware design for logarithmic number system (LNS) arithmetic. First, ...
Haohuan Fu, Oskar Mencer, Wayne Luk
SASP
2009
IEEE
291views Hardware» more  SASP 2009»
14 years 1 days ago
A parameterisable and scalable Smith-Waterman algorithm implementation on CUDA-compatible GPUs
—This paper describes a multi-threaded parallel design and implementation of the Smith-Waterman (SM) algorithm on compute unified device architecture (CUDA)-compatible graphic pr...
Cheng Ling, Khaled Benkrid, Tsuyoshi Hamada
FCCM
2008
IEEE
212views VLSI» more  FCCM 2008»
13 years 11 months ago
Map-reduce as a Programming Model for Custom Computing Machines
The map-reduce model requires users to express their problem in terms of a map function that processes single records in a stream, and a reduce function that merges all mapped out...
Jackson H. C. Yeung, C. C. Tsang, Kuen Hung Tsoi, ...