Distributed and Parallel Computing

109

PPOPP
2015
ACM

16views Distributed and Parallel Com...» more PPOPP 2015»

Supporting multiple accelerators in high-level programming models

10 years 1 months ago

Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in workstations, servers and supercomputers for scientiﬁc and engineering...

Yonghong Yan 0001, Pei-Hung Lin, Chunhua Liao, Bro...

claim paper

Read More »

91

click to vote

PPOPP
2015
ACM

11views Distributed and Parallel Com...» more PPOPP 2015»

Adaptive GPU cache bypassing

10 years 1 months ago

Download www.computermachines.org

Modern graphics processing units (GPUs) include hardwarecontrolled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are ine�...

Yingying Tian, Sooraj Puthoor, Joseph L. Greathous...

claim paper

Read More »

104

click to vote

PPOPP
2015
ACM

4views Distributed and Parallel Com...» more PPOPP 2015»

Predicate RCU: an RCU for scalable concurrent updates

10 years 1 months ago

Download www.cs.technion.ac.il

Read-copy update (RCU) is a shared memory synchronization mechanism with scalable synchronization-free reads that nevertheless execute correctly with concurrent updates. To guaran...

Maya Arbel, Adam Morrison

claim paper

Read More »

111

click to vote

PPOPP
2015
ACM

16views Distributed and Parallel Com...» more PPOPP 2015»

A collection-oriented programming model for performance portability

10 years 1 months ago

Download www.cs.utah.edu

This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operator...

Saurav Muralidharan, Michael Garland, Bryan C. Cat...

claim paper

Read More »

98

click to vote

PPOPP
2015
ACM

9views Distributed and Parallel Com...» more PPOPP 2015»

Diagnosing the causes and severity of one-sided message contention

10 years 1 months ago

Download sc14.supercomputing.org

Nathan R. Tallent, Abhinav Vishnu, Hubertus van Da...

claim paper

Read More »

113

click to vote

PPOPP
2015
ACM

8views Distributed and Parallel Com...» more PPOPP 2015»

Optimization for performance and energy for batched matrix computations on GPUs

10 years 1 months ago

Download www.netlib.org

As modern hardware keeps evolving, an increasingly eﬀective approach to develop energy eﬃcient and high-performance solvers is to design them to work on many small size indepe...

Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stani...

claim paper

Read More »

97

click to vote

PPOPP
2015
ACM

7views Distributed and Parallel Com...» more PPOPP 2015»

Effects of source-code optimizations on GPU performance and energy consumption

10 years 1 months ago

Download cs.txstate.edu

This paper studies the effects of source-code optimizations on the performance, power draw, and energy consumption of a modern compute GPU. We evaluate 128 versions of two n-body ...

Jared Coplin, Martin Burtscher

claim paper

Read More »

99

click to vote

PPOPP
2015
ACM

13views Distributed and Parallel Com...» more PPOPP 2015»

More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent

10 years 1 months ago

Download sydney.edu.au

In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure alg...

Vincent Gramoli

claim paper

Read More »

107

click to vote

PPOPP
2015
ACM

5views Distributed and Parallel Com...» more PPOPP 2015»

SYNC or ASYNC: time to fuse for distributed graph-parallel computation

10 years 1 months ago

Download ipads.se.sjtu.edu.cn

Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence conditi...

Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang,...

claim paper

Read More »

108

click to vote

PPOPP
2015
ACM

8views Distributed and Parallel Com...» more PPOPP 2015»

A library for portable and composable data locality optimizations for NUMA systems

10 years 1 months ago

Download e-collection.library.ethz.ch

Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Opt...

Zoltan Majo, Thomas R. Gross

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers