This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on object...
Despite large caches, main-memory access latencies still cause significant performance losses in many applications. Numerous hardware and software prefetching schemes tolerate th...
Zhenlin Wang, Doug Burger, Steven K. Reinhardt, Ka...
In this paper we describe the design and implementation of a flexible, and extensible, just-in-time ARM simulator designed to run co-operatively with a multi-core DSP simulator on...
The Cell processor is a heterogeneous multi-core processor with one Power Processing Engine (PPE) core and eight Synergistic Processing Engine (SPE) cores. Each SPE has a directly...
Kevin O'Brien, Kathryn M. O'Brien, Zehra Sura, Ton...
In this paper, we take the idea of application-level processing on disks to one level further, and focus on an architecture, called Cluster of Active Disks (CAD), where the storag...