We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
GridFTP, designed by using the Globus XIO framework, is one of the most popular methods for performing data transfers in the Grid environment. But the performance of GridFTP in WA...
Hari Subramoni, Ping Lai, Rajkumar Kettimuthu, Dha...
Abstract. We present a new benchmark suite for parallel computers. SPEComp targets mid-size parallel servers. It includes a number of science/engineering and data processing applic...
Vishal Aslot, Max J. Domeika, Rudolf Eigenmann, Gr...
The Sony–Toshiba–IBM Cell Broadband Engine (Cell/B.E.) is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-process...
David A. Bader, Virat Agarwal, Kamesh Madduri, Seu...
This paper describes an approach to carry out performance analysis on systems which combine two major characteristics: real-time behaviour and parallel computational structure. It ...