Chip multiprocessors designed for streaming applications such as Cell BE offer impressive peak performance but suffer from limited bandwidth to offchip main memory. As the number o...
: Sorting large data sets has always been an important application, and hence has been one of the benchmark applications on new parallel architectures. We present a parallel sortin...
Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
In this paper we describe the design and implementation of CellSort − a high performance distributed sort algorithm for the Cell processor. We design CellSort as a distributed b...
The problem of merging two intransitive sorted sequences (that is, to generate a sorted total order without the transitive property) is considered. A cost-optimal parallel merging...