Search Sciweavers | Sciweavers

10 search results - page 2 / 2

» Dense linear algebra solvers for multicore with GPU accelera...

click to vote

CORR
2008
Springer

162views Education» more CORR 2008»

Accelerating Scientific Computations with Mixed Precision Algorithms

13 years 5 months ago

Download www.netlib.org

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit ...

Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jak...

claim paper

Read More »

click to vote

PDP
2010
IEEE

218views Distributed And Parallel Com...» more PDP 2010»

Experimental Study of Six Different Implementations of Parallel Matrix Multiplication on Heterogeneous Computational Clusters of

14 years 5 days ago

Download hcl.ucd.ie

—Two strategies of distribution of computations can be used to implement parallel solvers for dense linear algebra problems for Heterogeneous Computational Clusters of Multicore ...

Pedro Alonso, Ravi Reddy, Alexey L. Lastovetsky

claim paper

Read More »

click to vote

EUROPAR
2009
Springer

131views Distributed And Parallel Com...» more EUROPAR 2009»

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

13 years 12 months ago

Download www.hipeac.net

While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/p...

Eduard Ayguadé, Rosa M. Badia, Francisco D....

claim paper

Read More »

click to vote

ASPLOS
2009
ACM

248views Programming Languages» more ASPLOS 2009»

QR decomposition on GPUs

14 years 6 months ago

Download users.ece.gatech.edu

QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...

Andrew Kerr, Dan Campbell, Mark Richards

claim paper

Read More »

click to vote

PPOPP
2010
ACM

222views Distributed and Parallel Com...» more PPOPP 2010»

Scaling LAPACK panel operations using parallel cache assignment

14 years 2 months ago

Download www.cs.utsa.edu

In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...

Anthony M. Castaldo, R. Clint Whaley

claim paper

Read More »

« Prev « First page 2 / 2 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers