In this paper, we propose an integrated approach for register-sensitive software pipelining. In this approach, the heuristics proposed in the stage scheduling method of Eichenberg...
Amod K. Dani, V. Janaki Ramanan, Ramaswamy Govinda...
Current practice in the design of application software for high-performance embedded computing systems is characterized by long development times, lack of interoperability with ot...
Abstract. We present a uni ed approach for expressing high performance numerical linear algebra routines for a class of dense and sparse matrix formats and shapes. As with the Stan...
This paper evaluates the use of per-node multi-threading to hide remote memory and synchronization latencies in a software DSM. As with hardware systems, multi-threading in softwa...
Abstract—This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationshi...