The memory hierarchy of most multicore systems contains one or more levels of cache that is shared among multiple cores. The shared-cache architecture presents many opportunities f...
An effort to formalize the process of software pipelining loops with conditions is presented in this paper. A formal framework for scheduling such loops, based on representing set...
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
In this paper, we present a GSA-based technique that performs more e cient and more precise symbolic analysis of predicated assignments, recurrences and index arrays. The e ciency...
We present Lazy Binary Splitting (LBS), a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager Binary Splitting work-stealing...
Alexandros Tzannes, George C. Caragea, Rajeev Baru...