Modern out-of-order processors tolerate long latency memory operations by supporting a large number of inflight instructions. This is particularly useful in numerical applications...
We describe the design of a runtime system for a fine-grained concurrent object-oriented (actor) language and its performance. The runtime system provides considerable flexibility...
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance parall...
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrin...
This paper analyzes the performance of concurrent (index) scan operations in both record (NSM/PAX) and column (DSM) disk storage models and shows that existing scheduling policies...
Software or hardware data cache prefetching is an efficient way to hide cache miss latency. However effectiveness of the issued prefetches have to be monitored in order to maximi...