We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
This paper describes the study conducted to design and evaluate a two-level on-line scheduler to dynamically schedule a stream of sequential and multi-threaded batch jobs on large...
Marco Pasquali, Ranieri Baraglia, Gabriele Capanni...
Computing the minimal elements of a partially ordered finite set (poset) is a fundamental problem in combinatorics with numerous applications such as polynomial expression optimiz...
Charles E. Leiserson, Marc Moreno Maza, Liyun Li, ...
Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...