Sciweavers

8 search results - page 1 / 2
» Automatic blocking of QR and LU factorizations for locality
Sort
View
ACMMSP
2004
ACM
89views Hardware» more  ACMMSP 2004»
13 years 10 months ago
Automatic blocking of QR and LU factorizations for locality
Qing Yi, Ken Kennedy, Haihang You, Keith Seymour, ...
PPOPP
2010
ACM
14 years 2 months ago
Scaling LAPACK panel operations using parallel cache assignment
In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
Anthony M. Castaldo, R. Clint Whaley
CORR
2007
Springer
141views Education» more  CORR 2007»
13 years 4 months ago
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in or...
Alfredo Buttari, Julien Langou, Jakub Kurzak, Jack...
IPPS
1998
IEEE
13 years 9 months ago
High Performance Linear Algebra Package LAPACK90
Abstract. LAPACK90 is a set of LAPACK90 subroutines which interfaces FORTRAN90 with LAPACK. All LAPACK driver subroutines including expert drivers and some LAPACK computationals ha...
Jack Dongarra, Jerzy Wasniewski
CC
2008
Springer
124views System Software» more  CC 2008»
13 years 6 months ago
Coqa: Concurrent Objects with Quantized Atomicity
This paper introduces a new language model, Coqa, for deeply embedding concurrent programming into objects. Every program written in our language has the desirable behaviors of ato...
Yu David Liu, Xiaoqi Lu, Scott F. Smith