Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlappi...
fficient Programming Abstractions for Heterogeneous Multicore Systems on Chip Alastair D. Reid Krisztian Flautner Edmund Grimley-Evans ARM Ltd Yuan Lin University of Michigan The ...
Real-time graphics hardware continues to offer improved resources for programmable vertex and fragment shaders. However, shader programmers continue to write shaders that require ...
Andrew Riffel, Aaron E. Lefohn, Kiril Vidimce, Mar...
Random access memory (RAM) is tightly-constrained in many embedded systems. This is especially true for the least expensive, lowest-power embedded systems, such as sensor network ...
Transactional Memory (TM) simplifies parallel programming by supporting atomic and isolated execution of user-identified tasks. To date, TM programming has required the use of l...
Woongki Baek, Chi Cao Minh, Martin Trautmann, Chri...