Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Experience has shown that the power consumption of sensors and other wireless computational devices is often dominated by their communication patterns. We present a practical real...
The goal of this work is to explore architectural mechanisms for supporting explicit communication in cachecoherent shared memory multiprocessors. The motivation stems from the ob...
Memory-intensive threads can hoard shared resources without making progress on a multithreading processor (SMT), thereby hindering the overall system performance. A recent promisi...
A task-based execution provides a universal approach to dynamic load balancing for irregular applications. Tasks are arbitrary units of work that are created dynamically at runtim...