This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical archit...
Walden Ko, Mark N. Yankelevsky, Dimitrios S. Nikol...
This paper describes a framework for providing the ability to use multiple specialized data parallel libraries and/or languages within a single application. The ability to use mul...
The relative decline of single-threaded processor performance, coupled with the ongoing shift towards on chip parallelism requires that CAD applications run efficiently on paralle...
—Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package a...
A bulk synchronous computation proceeds in phases that are separated by barrier synchronization. For dynamic bulk synchronous computations that exhibit varying phase-wise computat...