In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that su...
Runtime parallel optimization has been suggested as a means to overcome the difficulties of parallel programming. For runtime parallel optimization to be effective, parallelism a...
David A. Penry, Daniel J. Richins, Tyler S. Harris...
In this paper, we study the problem of scheduling parallel loops at compile-time for a heterogeneous network of machines. We consider heterogeneity in three aspects of parallel pr...
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
We introduce Scioto, Shared Collections of Task Objects, a lightweight framework for providing task management on distributed memory machines under one-sided and globalview parall...
James Dinan, Sriram Krishnamoorthy, D. Brian Larki...