Concurrent collection classes are widely used in multi-threaded programming, but they provide atomicity only for a fixed set of operations. Software transactional memory (STM) pr...
Nathan Grasso Bronson, Jared Casper, Hassan Chafi,...
Good data layouts improve cache and TLB performance of object-oriented software, but unfortunately, selecting an optimal data layout a priori is NP-hard. This paper introduces layo...
This paper describes Rthreads (Remote threads), a software distributed shared memory system that supports sharing of global variables on clusters of computers with physically dist...
Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective...
Jeffrey Dean, James E. Hicks, Carl A. Waldspurger,...
Multicore processors are an architectural paradigm shift that promises a dramatic increase in performance. But, they also bring an unprecedented level of complexity in algorithmic ...
Daniele Paolo Scarpazza, Oreste Villa, Fabrizio Pe...