The speed gap between processor and memory continues to limit performance. To address this problem, we explore the potential of eliminating Zero Loads—loads accessing memory loc...
Md. Mafijul Islam, Sally A. McKee, Per Stenstr&oum...
Abstract--The size, heterogeneity and dynamism of the execution platforms of scientific applications, like computational grids, make using those platforms complex. Furthermore, tod...
Recent advances in polyhedral compilation technology have made it feasible to automatically transform affine sequential loop nests for tiled parallel execution on multi-core proce...
Abstract— Distributed stream processing systems offer a highly scalable and dynamically configurable platform for time-critical applications ranging from real-time, exploratory ...
Lisa Amini, Navendu Jain, Anshul Sehgal, Jeremy Si...
Memory system bottlenecks limit performance for many applications, and computations with strided access patterns are among the hardest hit. The streams used in such applications h...