Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is ...
Ramachandra C. Nanjegowda, Oscar Hernandez, Barbar...
In this paper we describe a flexible and efficient new algorithm for model order reduction of parameterized systems. The method is based on the reformulation of the parametric s...
Selecting the close-to-optimal collective algorithm based on the parameters of the collective call at run time is an important step for achieving good performance of MPI applicatio...
Jelena Pjesivac-Grbovic, George Bosilca, Graham E....
We consider scheduling problems in which a job consists of components of different types to be processed on m machines. Each machine is capable of processing components of a singl...