Sciweavers

795 search results - page 55 / 159
» Efficient Coupling of Parallel Applications Using PAWS
Sort
View
85
Voted
CF
2004
ACM
15 years 6 months ago
MaRS: a macro-pipelined reconfigurable system
We introduce MaRS, a reconfigurable, parallel computing engine with special emphasis on scalability, lending itself to the computation-/data-intensive multimedia data processing a...
Nozar Tabrizi, Nader Bagherzadeh, Amir Hosein Kama...
115
Voted
ISCAPDCS
2001
15 years 1 months ago
End-user Tools for Application Performance Analysis Using Hardware Counters
One purpose of the end-user tools described in this paper is to give users a graphical representation of performance information that has been gathered by instrumenting an applica...
Kevin S. London, Jack Dongarra, Shirley Moore, Phi...
124
Voted
LCPC
1994
Springer
15 years 4 months ago
Optimizing Array Distributions in Data-Parallel Programs
Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored gra...
Krishna Kunchithapadam, Barton P. Miller
MICRO
2010
IEEE
149views Hardware» more  MICRO 2010»
14 years 10 months ago
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels
Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
Michael Steffen, Joseph Zambreno
ISPAN
2005
IEEE
15 years 6 months ago
An Efficient MPI-IO for Noncontiguous Data Access over InfiniBand
Noncontiguous data access is a very common access pattern in many scientific applications. Using POSIX I/O to access many pieces of noncontiguous data segments will generate a lot...
Ding-Yong Hong, Ching-Wen You, Yeh-Ching Chung