The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory acc...
Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers,...
- Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multi...
Andrew Sohn, Yuetsu Kodama, Jui Ku, Mitsuhisa Sato...
Abstract. Digital numbers D are the world’s most popular data representation: nearly all texts, sounds and images are coded somewhere in time and space by binary sequences. The m...
This paper describes the design of a compiler which can translate out-of-core programs written in a data parallel language like HPF. Such a compiler is required for compiling larg...
Rajeev Thakur, Rajesh Bordawekar, Alok N. Choudhar...
Automatic tuning has emerged as a solution to provide high-performance libraries for fast changing, increasingly complex computer architectures. We distinguish offline adaptation (...