This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout th...
Abstract. In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees a...
We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a mac...
A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser a...
Jun Shirako, David M. Peixotto, Vivek Sarkar, Will...
Next-generation high-end Network Processors (NP) must address demands from both diversified applications and ever-increasing traffic pressure. One major challenge is to design an e...
Lei Shi, Yue Zhang 0006, Jianming Yu, Bo Xu, Bin L...