Sciweavers

EUROPAR
2009
Springer

High Performance Matrix Multiplication on Many Cores

13 years 11 months ago
High Performance Matrix Multiplication on Many Cores
Moore’s Law suggests that the number of processing cores on a single chip increases exponentially. The future performance increases will be mainly extracted from thread-level parallelism exploited by multi/many-core processors (MCP). Therefore, it is necessary to find out how to build the MCP hardware and how to program the parallelism on such MCP. In this work, we intend to identity the key architecture mechanisms and software optimizations to guarantee high performance for multithreaded programs. To illustrate this, we customize a dense matrix multiplication algorithm on Godson-T MCP as a case study to demonstrate the efficient synergy and interaction between hardware and software. Experiments conducted on the cycle-accurate simulator show that the optimized matrix multiplication could obtain 97.1% (124.3GFLOPS) of the peak performance of Godson-T.
Nan Yuan, Yongbin Zhou, Guangming Tan, Junchao Zha
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where EUROPAR
Authors Nan Yuan, Yongbin Zhou, Guangming Tan, Junchao Zhang, Dongrui Fan
Comments (0)