Sciweavers

PPOPP
2015
ACM
8 years 5 days ago
Supporting multiple accelerators in high-level programming models
Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in workstations, servers and supercomputers for scientific and engineering...
Yonghong Yan 0001, Pei-Hung Lin, Chunhua Liao, Bro...
PPOPP
2015
ACM
8 years 5 days ago
Adaptive GPU cache bypassing
Modern graphics processing units (GPUs) include hardwarecontrolled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are ine...
Yingying Tian, Sooraj Puthoor, Joseph L. Greathous...
PPOPP
2015
ACM
8 years 5 days ago
Predicate RCU: an RCU for scalable concurrent updates
Read-copy update (RCU) is a shared memory synchronization mechanism with scalable synchronization-free reads that nevertheless execute correctly with concurrent updates. To guaran...
Maya Arbel, Adam Morrison
PPOPP
2015
ACM
8 years 5 days ago
A collection-oriented programming model for performance portability
This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operator...
Saurav Muralidharan, Michael Garland, Bryan C. Cat...
PPOPP
2015
ACM
8 years 5 days ago
Optimization for performance and energy for batched matrix computations on GPUs
As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size indepe...
Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stani...
PPOPP
2015
ACM
8 years 5 days ago
Effects of source-code optimizations on GPU performance and energy consumption
This paper studies the effects of source-code optimizations on the performance, power draw, and energy consumption of a modern compute GPU. We evaluate 128 versions of two n-body ...
Jared Coplin, Martin Burtscher
PPOPP
2015
ACM
8 years 5 days ago
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent
In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure alg...
Vincent Gramoli
PPOPP
2015
ACM
8 years 5 days ago
SYNC or ASYNC: time to fuse for distributed graph-parallel computation
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence conditi...
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang,...
PPOPP
2015
ACM
8 years 5 days ago
A library for portable and composable data locality optimizations for NUMA systems
Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Opt...
Zoltan Majo, Thomas R. Gross