Sciweavers

224 search results - page 23 / 45
» A Flexible Class of Parallel Matrix Multiplication Algorithm...
Sort
View
PAMI
2008
142views more  PAMI 2008»
14 years 9 months ago
Concurrent Computation of Attribute Filters on Shared Memory Parallel Machines
Morphological attribute filters have not previously been parallelized mainly because they are both global and nonseparable. We propose a parallel algorithm that achieves efficient ...
Michael H. F. Wilkinson, Hui Gao, Wim H. Hesselink...
CGO
2004
IEEE
15 years 1 months ago
Custom Data Layout for Memory Parallelism
In this paper, we describe a generalized approach to deriving a custom data layout in multiple memory banks for array-based computations, to facilitate high-bandwidth parallel mem...
Byoungro So, Mary W. Hall, Heidi E. Ziegler
ICPP
1993
IEEE
15 years 1 months ago
Dependence Analysis and Architecture Design for Bit-Level Algorithms
:. In designing application-specific bit-level architectures and in programming existing bit-level processor arrays, it is necessary to expand a word-level algorithm into its bit-...
Weijia Shang, Benjamin W. Wah
ICPPW
2007
IEEE
15 years 3 months ago
A Quality-Driven Algorithm for Resource Scheduling Based on Market Model on Grid
Several challenges about computational grid exist in integrating, coordinating and managing of resources and scheduling of applications, due to distributed resources at various le...
Lei Tang, Zhiyi Yang, Zhiwen Yu, Yunlan Wang
CCGRID
2011
IEEE
14 years 1 months ago
Small Discrete Fourier Transforms on GPUs
– Efficient implementations of the Discrete Fourier Transform (DFT) for GPUs provide good performance with large data sizes, but are not competitive with CPU code for small data ...
S. Mitra, A. Srinivasan