Sciweavers

PPOPP
2009
ACM
14 years 5 months ago
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
PPOPP
2009
ACM
14 years 5 months ago
Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs
In this paper, we present a thorough analysis of thread-level parallelism available in production High Performance Computing (HPC) codes. We survey a number of techniques that are...
Arun Kejariwal, Calin Cascaval
PPOPP
2009
ACM
14 years 5 months ago
Software transactional distributed shared memory
We have developed a transaction-based approach to distributed shared memory(DSM) that supports object caching and generates path expression prefetches. A path expression specifies...
Alokika Dash, Brian Demsky
PPOPP
2009
ACM
14 years 5 months ago
Serialization sets: a dynamic dependence-based parallel execution model
This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operati...
Matthew D. Allen, Srinath Sridharan, Gurindar S. S...
PPOPP
2009
ACM
14 years 5 months ago
Atomic quake: using transactional memory in an interactive multiplayer game server
Transactional Memory (TM) is being studied widely as a new technique for synchronizing concurrent accesses to shared memory data structures for use in multi-core systems. Much of ...
Adrián Cristal, Eduard Ayguadé, Fera...
PPOPP
2009
ACM
14 years 5 months ago
Mapping parallelism to multi-cores: a machine learning based approach
The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-bas...
Zheng Wang, Michael F. P. O'Boyle
PPOPP
2009
ACM
14 years 5 months ago
A compiler-directed data prefetching scheme for chip multiprocessors
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
PPOPP
2009
ACM
14 years 5 months ago
Effective performance measurement and analysis of multithreaded applications
Understanding why the performance of a multithreaded program does not improve linearly with the number of cores in a sharedmemory node populated with one or more multicore process...
Nathan R. Tallent, John M. Mellor-Crummey
PPOPP
2009
ACM
14 years 5 months ago
Parallel thinking
Guy E. Blelloch