Abstract. Performance of the on-chip cache is critical for processor. The multithread program model usually employed by on-chip many-core architectures may have effects on cache ac...
Processing and analyzing large volumes of data plays an increasingly important role in many domains of scienti c research. High-level language and compiler support for developing ...
Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, pr...
Hassan Chafi, Jared Casper, Brian D. Carlstrom, Au...
Large message latencies often lead to poor performance of parallel applications. In this paper, we investigate a latency-tolerating technique that immediately releases all blocking...
On machines with high-performance processors, the memory system continues to be a performance bottleneck. Compilers insert prefetch operations and reorder data accesses to improve...
Nathaniel McIntosh, Sandya Mannarswamy, Robert Hun...