With the advent of ubiquitous multi-core architectures, a major challenge is to simplify parallel programming. One way to tame one of the main sources of programming complexity, n...
Luis Ceze, Pablo Montesinos, Christoph von Praun, ...
Task-selection policies are critical to the performance of any architecture that uses speculation to extract parallel tasks from a sequential thread. This paper demonstrates that ...
Mayank Agarwal, Kshitiz Malik, Kevin M. Woley, Sam...
This paper proposes a hardware transactional memory (HTM) system called LogTM Signature Edition (LogTM-SE). LogTM-SE uses signatures to summarize a transaction's readand writ...
Luke Yen, Jayaram Bobba, Michael R. Marty, Kevin E...
The emergence of multicore architectures will lead to an increase in the use of multithreaded applications that are prone to synchronization bugs, such as data races. Software sol...
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-lat...
This paper presents hardware and software mechanisms to enable concurrent direct network access (CDNA) by operating systems running within a virtual machine monitor. In a conventi...
Jeffrey Shafer, David Carr, Aravind Menon, Scott R...
3D integration technology greatly increases transistor density while providing faster on-chip communication. 3D implementations of processors can simultaneously provide both laten...
Utilizing the nonuniform latencies of SDRAM devices, access reordering mechanisms alter the sequence of main memory access streams to reduce the observed access latency. Using a r...