While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
This paper analyzes the impact of hardware multithreading support on the performance of distributed shared-memory DSM multiprocessors built out of heterogeneous, single-chip compu...
Renato J. O. Figueiredo, Jeffrey P. Bradford, Jos&...
Speed improvements in today's processors have largely been delivered in the form of multiple cores, increasing the importance of ions that ease parallel programming. Software...
We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormou...
Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Ha...
A new stochastic method for locating the global minimum of a multidimensional function inside a rectangular hyperbox is presented. A sampling technique is employed that makes use ...