We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
Executing large number of independent tasks or tasks that perform minimal inter-task communication in parallel is a common requirement in many domains. In this paper, we present o...
Xiaohong Qiu, Jaliya Ekanayake, Scott Beason, Thil...
: The architecture of the IBM Cell BE processor represents a new approach for designing CPUs. The fast execution of legacy software has to stand back in order to achieve very high ...
Timo Schneider, Torsten Hoefler, Simon Wunderlich,...
A common statistical problem is that of nding the median element in a set of data. This paper presents a fastand portable parallel algorithm for nding the median given a set of el...
Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. Existin...
Isaac Gelado, Javier Cabezas, Nacho Navarro, John ...