Sciweavers

656 search results - page 6 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
100
Voted
ICPP
2009
IEEE
15 years 4 months ago
Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels
—Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architec...
Vasileios Karakasis, Georgios I. Goumas, Nectarios...
EUROPAR
2010
Springer
14 years 10 months ago
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
ICS
1995
Tsinghua U.
15 years 1 months ago
Data Forwarding in Scalable Shared-Memory Multiprocessors
David Koufaty, Xiangfeng Chen, David K. Poulsen, J...
72
Voted
EUROPAR
2006
Springer
15 years 1 months ago
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...