This paper studies the effects of source-code optimizations on the performance, power draw, and energy consumption of a modern compute GPU. We evaluate 128 versions of two n-body ...
In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure alg...
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence conditi...
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang,...
Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Opt...
In this paper, we consider concurrent programs in which the shared nsists of instances of linearizable ADTs (abstract data types). We present an automated approach to concurrency ...
Guy Golan-Gueta, G. Ramalingam, Mooly Sagiv, Eran ...