—Memory bandwidth has always been a critical factor for the performance of many data intensive applications. The increasing processor performance, and the advert of single chip m...
Felipe Cabarcas, Alejandro Rico, Yoav Etsion, Alex...
It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP) efficiently with an emulated shared memory (ESM) architecture to gain easy par...
We present a case study parallelizing streaming aggregation on three different parallel hardware architectures. Aggregation is a performance-critical operation for data summarizat...
Scott Schneider, Henrique Andrade, Bugra Gedik, Ku...
We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and...
Christopher J. Hughes, Radek Grzeszczuk, Eftychios...
The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, QR factorizations to t...