Sciweavers

69
Voted
PPOPP
2015
ACM
9 years 11 months ago
Adaptive GPU cache bypassing
Modern graphics processing units (GPUs) include hardwarecontrolled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are ineï¬...
Yingying Tian, Sooraj Puthoor, Joseph L. Greathous...
84
Voted
PPOPP
2015
ACM
9 years 11 months ago
Predicate RCU: an RCU for scalable concurrent updates
Read-copy update (RCU) is a shared memory synchronization mechanism with scalable synchronization-free reads that nevertheless execute correctly with concurrent updates. To guaran...
Maya Arbel, Adam Morrison
90
Voted
PPOPP
2015
ACM
9 years 11 months ago
A collection-oriented programming model for performance portability
This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operator...
Saurav Muralidharan, Michael Garland, Bryan C. Cat...
78
Voted
PPOPP
2015
ACM
9 years 11 months ago
Diagnosing the causes and severity of one-sided message contention
Nathan R. Tallent, Abhinav Vishnu, Hubertus van Da...
PPOPP
2015
ACM
9 years 11 months ago
Optimization for performance and energy for batched matrix computations on GPUs
As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size indepe...
Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stani...