Sciweavers

1075 search results - page 119 / 215
» Parallel Programming with Transactional Memory
Sort
View
IPPS
2009
IEEE
15 years 11 months ago
Scalable RDMA performance in PGAS languages
Partitioned Global Address Space (PGAS) languages provide a unique programming model that can span shared-memory multiprocessor (SMP) architectures, distributed memory machines, o...
Montse Farreras, George Almási, Calin Casca...
ASAP
2007
IEEE
136views Hardware» more  ASAP 2007»
15 years 11 months ago
0/1 Knapsack on Hardware: A Complete Solution
We present a memory efficient, practical, systolic, parallel architecture for the complete 0/1 knapsack dynamic programming problem, including backtracking. This problem was inte...
K. Nibbelink, S. Rajopadhye, R. McConnell
PPOPP
2011
ACM
14 years 7 months ago
GRace: a low-overhead mechanism for detecting data races in GPU programs
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel program...
Mai Zheng, Vignesh T. Ravi, Feng Qin, Gagan Agrawa...
314
Voted
ICFP
2012
ACM
13 years 7 months ago
Nested data-parallelism on the gpu
Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple...
Lars Bergstrom, John H. Reppy
CF
2006
ACM
15 years 8 months ago
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip
This paper presents our experience mapping OpenMP parallel programming model to the IBM Cyclops-64 (C64) architecture. The C64 employs a many-core-on-a-chip design that integrates...
Juan del Cuvillo, Weirong Zhu, Guang R. Gao