In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. Its main goal is to provide pa...
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...
We present a new programming language designed to allow the convenient expression of algorithms for a parallel random access machine (PRAM). The language attempts to satisfy two p...
: We study the scalability of 2-D discrete wavelet transform algorithms on fine-grained parallel architectures. The principal operation in the 2-D DWT is the filtering operation us...
Jamshed N. Patel, Ashfaq A. Khokhar, Leah H. Jamie...
– In this paper we present a new approach to benchmark the performance of shared memory systems. This approach focuses on recognizing how far off the performance of a given memor...