Testingthe performance scalabilityof parallelprograms can be a time consuming task, involving many performance runs for different computer configurations, processor numbers, and p...
Allen D. Malony, Vassilis Mertsiotakis, Andreas Qu...
Helper locks allow programs with large parallel critical sections, called parallel regions, to execute more efficiently by enlisting processors that might otherwise be waiting on ...
A proper understanding of communication patterns of parallel applications is important to optimize application performance and design better communication subsystems. Communicatio...
The Dryad and DryadLINQ systems offer a new programming model for large scale data-parallel computing. They generalize previous execution environments such as SQL and MapReduce in...
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Contr...