This paper presents a package, called Heterogeneous PBLAS (HeteroPBLAS), which is built on top of PBLAS and provides optimized parallel basic linear algebra subprograms for hetero...
Ravi Reddy Manumachu, Alexey L. Lastovetsky, Pedro...
The problem of counting specified combinations of a given set of variables arises in many statistical and data mining applications. To solve this problem, we introduce the PDtree...
Chad Scherrer, Nathaniel Beagley, Jarek Nieplocha,...
Programs with irregular patterns of dynamic data structures and/or those with complicated control structures such as recursion are notoriously difficult to parallelize efficient...
Sheng Li, Amit Kashyap, Shannon K. Kuntz, Jay B. B...
OpenMP is widely used for shared memory parallel programming and is especially useful for the parallelisation of loops. When it comes to task parallelism, however, OpenMP is less p...
Oliver Sinnen, Jsun Pe, Alexander Vladimirovich Ko...
Multi-lane vector processors achieve excellent computational throughput for programs with high data-level parallelism (DLP). However, application phases without significant DLP ar...