Sciweavers

17582 search results - page 31 / 3517
» From Distributed Sequential Computing to Distributed Paralle...
Sort
View
IPPS
2010
IEEE
14 years 11 months ago
An auto-tuning framework for parallel multicore stencil computations
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addi...
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, ...
87
Voted
EUROPAR
2000
Springer
15 years 5 months ago
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping
Abstract. When parallelizing loop nests for distributed memory parallel computers, we have to specify when the different computations are carried out (computation scheduling), wher...
Alain Darte, Claude G. Diderich, Marc Gengler, Fr&...
104
Voted
CONCURRENCY
1998
151views more  CONCURRENCY 1998»
15 years 1 months ago
A new parallel matrix multiplication algorithm on distributed-memory concurrent computers
We present a new fast and scalable matrix multiplication algorithm, called DIMMA Distribution-Independent Matrix Multiplication Algorithm, for block cyclic data distribution on ...
Jaeyoung Choi
94
Voted
IPPS
1998
IEEE
15 years 6 months ago
Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices
Several fast sequential algorithms have been proposed in the past to multiply sparse matrices. These algorithms do not explicitlyaddresstheimpactofcachingonperformance. We show th...
Peter Sulatycke, Kanad Ghose