Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...
Heterogeneous sensor networks consisting of resource-constrained nodes as well as resource-intensive nodes equipped with high-bandwidth sensors offer significant advantages for dev...
Abstract—Widespread emergence of multicore processors will spur development of parallel applications, exposing programmers to degrees of hardware concurrency hitherto unavailable...
Processor architectures with tens to hundreds of arithmetic units are emerging to handle media processing applications. These applications, such as image coding, image synthesis, ...
Scott Rixner, William J. Dally, Brucek Khailany, P...
This paper proposes efficient algorithms for implementing multicast in heterogeneous workstation/PC clusters. Multicast is an important operation in many scientific and industri...