The designs of high-performance processor architectures are moving toward the integration of a large number of multiple processing cores on a single chip. The IBM Cyclops-64 (C64)...
Abstract. Sparse matrix-vector multiplication is an important computational kernel that tends to perform poorly on modern processors, largely because of its high ratio of memory op...
- For modern processor designs in nanometer technologies, both block and interconnect pipelining are needed to achieve multi-gigahertz clock frequency, but previous approaches cons...
Yuchun Ma, Zhuoyuan Li, Jason Cong, Xianlong Hong,...
One of the most important problems faced by microarchitecture designers is the poor scalability of some of the current solutions with increased clock frequencies and wider pipelin...
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms on various computer networks. In our previ...