Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to a...
This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-cont...
Modern server farm and cluster sites consume large quantities of energy both to power and cool the machines in the site. At the same time, less power supply redundancy is offered ...
Ramakrishna Kotla, Soraya Ghiasi, Tom W. Keller, F...
This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads concurrently running on CMPs. Unlike cache partitioning schemes that use a singl...
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...