This paper presents a two-part study on managing distributed NUCA (Non-Uniform Cache Architecture) L2 caches in a future manycore processor to obtain high singlethread program per...
The message-passing paradigm is now widely accepted and used mainly for inter-process communication in distributed memory parallel systems. However, one of its disadvantages is the...
Abstract. We reexamine the limits of parallelism available in programs, using runtime reconstruction of program data-flow graphs. While limits of parallelism have been examined in...
This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing c...
We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/...
Priya Natarajan, Thomas H. Cormen, Elena Riccio St...