We introduce MaRS, a reconfigurable, parallel computing engine with special emphasis on scalability, lending itself to the computation-/data-intensive multimedia data processing a...
Nozar Tabrizi, Nader Bagherzadeh, Amir Hosein Kama...
One purpose of the end-user tools described in this paper is to give users a graphical representation of performance information that has been gathered by instrumenting an applica...
Kevin S. London, Jack Dongarra, Shirley Moore, Phi...
Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored gra...
Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
Noncontiguous data access is a very common access pattern in many scientific applications. Using POSIX I/O to access many pieces of noncontiguous data segments will generate a lot...