Search Sciweavers | Sciweavers

12 search results - page 2 / 3

» Mechanizing the expert dense linear algebra developer

click to vote

PARA
1995
Springer

174views Applied Computing» more PARA 1995»

A Proposal for a Set of Parallel Basic Linear Algebra Subprograms

13 years 9 months ago

Download phase.hpcc.jp

This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...

Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...

claim paper

Read More »

click to vote

ASPLOS
2009
ACM

248views Programming Languages» more ASPLOS 2009»

QR decomposition on GPUs

14 years 6 months ago

Download users.ece.gatech.edu

QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...

Andrew Kerr, Dan Campbell, Mark Richards

claim paper

Read More »

click to vote

ICCS
2009
Springer

191views Applied Computing» more ICCS 2009»

A Note on Auto-tuning GEMM for GPUs

14 years 9 days ago

Download www.netlib.org

The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is espe...

Yinan Li, Jack Dongarra, Stanimire Tomov

claim paper

Read More »

click to vote

IPPS
2002
IEEE

152views Distributed And Parallel Com...» more IPPS 2002»

Optimizing Graph Algorithms for Improved Cache Performance

13 years 10 months ago

Download halcyon.usc.edu

Tiling has long been used to improve cache performance. Recursion has recently been used as a cache-oblivious method of improving cache performance. Both of these techniques are n...

Joon-Sang Park, Michael Penner, Viktor K. Prasanna

claim paper

Read More »

click to vote

ICS
1999
Tsinghua U.

180views Distributed And Parallel Com...» more ICS 1999»

An experimental evaluation of tiling and shackling for memory hierarchy management

13 years 10 months ago

Download iss.ices.utexas.edu

On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring ...

Induprakas Kodukula, Keshav Pingali, Robert Cox, D...

claim paper

Read More »

« Prev « First page 2 / 3 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers