Sciweavers

5171 search results - page 695 / 1035
» Deterministic Parallel Processing
Sort
View
PPOPP
2010
ACM
16 years 2 months ago
Model-driven autotuning of sparse matrix-vector multiply on GPUs
We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing...
Jee W. Choi, Amik Singh, Richard W. Vuduc
ICCD
2002
IEEE
128views Hardware» more  ICCD 2002»
16 years 1 months ago
Subword Sorting with Versatile Permutation Instructions
Subword parallelism has succeeded in accelerating many multimedia applications. Subword permutation instructions have been proposed to efficiently rearrange subwords in or among r...
Zhijie Shi, Ruby B. Lee
ICCAD
2001
IEEE
91views Hardware» more  ICCAD 2001»
16 years 1 months ago
A System for Synthesizing Optimized FPGA Hardware from MATLAB
Efficient high level design tools that can map behavioral descriptions to FPGA architectures are one of the key requirements to fully leverage FPGA for high throughput computatio...
Malay Haldar, Anshuman Nayak, Alok N. Choudhary, P...
PDP
2010
IEEE
15 years 12 months ago
Trusted Interaction Patterns in Large-scale Enterprise Service Networks
Abstract—The evolution towards cross-organizational collaboration and interaction patterns has led to the emergence of scalable, Web services-based composition infrastructures. T...
Florian Skopik, Daniel Schall, Schahram Dustdar
PDP
2009
IEEE
15 years 11 months ago
High Throughput Intra-Node MPI Communication with Open-MX
Abstract—The increasing number of cores per node in highperformance computing requires an efficient intra-node MPI communication subsystem. Most existing MPI implementations rel...
Brice Goglin