Sciweavers

PPOPP
2016
ACM
8 years 1 months ago
Auto-vectorizing a large-scale production unstructured-mesh CFD application
For modern x86 based CPUs with increasingly longer vector lengths, achieving good vectorization has become very important for gaining higher performance. Using very explicit SIMD ...
Gihan R. Mudalige, I. Z. Reguly, Michael B. Giles
PPOPP
2016
ACM
8 years 1 months ago
High performance model based image reconstruction
In Computed Tomography (CT) methods, Model Based Iterative Reconstruction (MBIR) produces higher quality images than commonly used Filtered Backprojection (FBP) but at a very high...
Xiao Wang, Amit Sabne, Sherman J. Kisner, Anand Ra...
PPOPP
2016
ACM
8 years 1 months ago
Declarative coordination of graph-based parallel programs
Declarative programming has been hailed as a promising approach to parallel programming since it makes it easier to reason about programs while hiding the implementation details o...
Flávio Cruz, Ricardo Rocha, Seth Copen Gold...
PPOPP
2016
ACM
8 years 1 months ago
Benchmarking weak memory models
To achieve good multi-core performance, modern microprocessors have weak memory models, rather than enforce sequential consistency. This gives the programmer a wide scope for choo...
Carl G. Ritson, Scott Owens
PPOPP
2016
ACM
8 years 1 months ago
Be my guest: MCS lock now welcomes guests
The MCS lock is one of the most prevalent queuing locks. It provides fair scheduling and high performance on massively parallel systems. However, the MCS lock mandates a bring-you...
Tianzheng Wang, Milind Chabbi, Hideaki Kimura
PPOPP
2016
ACM
8 years 1 months ago
OPR: deterministic group replay for one-sided communication
Xuehai Qian, Koushik Sen, Paul Hargrove, Costin Ia...
PPOPP
2016
ACM
8 years 1 months ago
Accelerating Dynamic Data Race Detection Using Static Thread Interference Analysis
Precise dynamic race detectors report an error if and only if more than one thread concurrently exhibits conflict on a memory access. They insert instrumentations at compiletime ...
Peng Di, Yulei Sui
PPOPP
2016
ACM
8 years 1 months ago
Lease/release: architectural support for scaling contended data structures
High memory contention is generally agreed to be a worst-case scenario for concurrent data structures. There has been a significant amount of research effort spent investigating ...
Syed Kamran Haider, William Hasenplaugh, Dan Alist...
PPOPP
2016
ACM
8 years 1 months ago
Optimistic concurrency with OPTIK
We introduce OPTIK, a new practical design pattern for designing and implementing fast and scalable concurrent data structures. OPTIK relies on the commonly-used technique of vers...
Rachid Guerraoui, Vasileios Trigonakis
PPOPP
2016
ACM
8 years 1 months ago
Performance portable GPU code generation for matrix multiplication
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming la...
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, Chri...