Sciweavers

753 search results - page 17 / 151
» Mechanisms for Mapping High-Level Parallel Performance Data
Sort
View
HIPS
1998
IEEE
15 years 4 months ago
Implementing Automatic Coordination on Networks of Workstations
Distributed shared objects are a well known approach to achieve independenceof the memory model for parallel programming. The illusion of shared (global) objects is a conabstracti...
Christian Weiß, Jürgen Knopp, Hermann H...
ASAP
1996
IEEE
145views Hardware» more  ASAP 1996»
15 years 3 months ago
A Synthesis System For Bus-Based Wavefront Array Architectures
A datapath synthesis system (DPSS) for a bus-based wavefront array architecture, called rDPA (reconfigurable datapath architecture), is presented. An internal data bus to the arra...
Reiner W. Hartenstein, Jürgen Becker, Michael...
ASPLOS
2009
ACM
15 years 6 months ago
Performance analysis of accelerated image registration using GPGPU
This paper presents a performance analysis of an accelerated 2-D rigid image registration implementation that employs the Compute Unified Device Architecture (CUDA) programming e...
Peter Bui, Jay B. Brockman
IPPS
2000
IEEE
15 years 4 months ago
Dynamic Data Layouts for Cache-Conscious Factorization of DFT
Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimizationtechniques for computing ...
Neungsoo Park, Dongsoo Kang, Kiran Bondalapati, Vi...
PDP
2010
IEEE
15 years 5 months ago
A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
- We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well sui...
Marco Ament, Günter Knittel, Daniel Weiskopf,...