It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On th...
The growing complexity of modern processors has made the generation of highly efficient code increasingly difficult. Manual code generation is very time consuming, but it is oft...
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures ...
Instruction and data address traces are widely used by computer designers for quantitative evaluations of new architectures and workload characterization, as well as by software de...
Milena Milenkovic, Aleksandar Milenkovic, Martin B...
Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, allowing programmers to exploit more effectively the s...
Colin Blundell, Joe Devietti, E. Christopher Lewis...