Abstract. To e ectively parallelize real programs, parallelizing compilers need powerful symbolic analysis techniques 13, 6]. In previous work we have introduced an algorithm calle...
Abstract— In this paper we address the problem of the architectural exploration from the energy/performance point of view of a VLIW processor for embedded systems. We also consid...
Abstract. Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors ...
We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing...
This paper presents a package, called Heterogeneous PBLAS (HeteroPBLAS), which is built on top of PBLAS and provides optimized parallel basic linear algebra subprograms for hetero...
Ravi Reddy Manumachu, Alexey L. Lastovetsky, Pedro...