Sciweavers

52 search results - page 2 / 11
» Strategies and Implementation for Translating OpenMP Code fo...
Sort
View
HPCA
2002
IEEE
14 years 4 months ago
CableS: Thread Control and Memory Management Extensions for Shared Virtual Memory Clusters
Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although c...
Peter Jamieson, Angelos Bilas
SC
2000
ACM
13 years 8 months ago
Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling
The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP cluster...
D. S. Henty
ICCS
2005
Springer
13 years 10 months ago
Fast Expression Templates
Abstract. Expression templates (ET) can significantly reduce the implementation effort of mathematical software. For some compilers, especially for those of supercomputers, it ca...
Jochen Härdtlein, Alexander Linke, Christoph ...
FPL
2009
Springer
172views Hardware» more  FPL 2009»
13 years 9 months ago
Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor...
Nachiket Kapre, André DeHon
IPPS
2009
IEEE
13 years 11 months ago
Phaser accumulators: A new reduction construct for dynamic parallelism
A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser a...
Jun Shirako, David M. Peixotto, Vivek Sarkar, Will...