Many parallel applications from scientific computing use MPI global communication operations to collect or distribute data. Since the execution times of these communication opera...
Homologydetection in large data bases is probably the most time consuming operation in molecular genetic computing systems. Moreover, the progresses made all around the world conc...
In this paper we present a multi-GPU parallel volume rendering implemention built using the MapReduce programming model. We give implementation details of the library, including s...
Jeff A. Stuart, Cheng-Kai Chen, Kwan-Liu Ma, John ...
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
Stencil computation (SC) is of critical importance for broad scientific and engineering applications. However, it is a challenge to optimize complex, highorder SC on emerging clus...
Liu Peng, Richard Seymour, Ken-ichi Nomura, Rajiv ...