Abstract. This paper is motivated by the desire to provide an efficient and scalable software cache implementation of OpenMP on multicore and manycore architectures in general, and...
Chen Chen, Joseph B. Manzano, Ge Gan, Guang R. Gao...
Abstract. Prefetching transfers a data item in advance from its storage location to its usage location so that communication is hidden and does not delay computation. We present a ...
Michael Klemm, Jean Christophe Beyler, Ronny T. La...
We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols—a major complication in existing distri...
Marcos Kawazoe Aguilera, Arif Merchant, Mehul A. S...
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisiti...
We present the Mindicator, a set implementation customized for shared memory runtime systems. The Mindicator is optimized for constant-time querying of its minimum element, while ...