Abstract—Sharing patterns in shared-memory multiprocessors are the key to performance: uniprocessor latencytolerating techniques such as out-of-order execution and non-blocking c...
Prefetching offers the potential to improve the performance of linked data structure (LDS) traversals. However, previously proposed prefetching methods only work well when there i...
Magnus Karlsson, Fredrik Dahlgren, Per Stenstr&oum...
This paper describes a new instruction-supply mechanism, called the eXtended Block Cache (XBC). The goal of the XBC is to improve on the Trace Cache (TC) hit rate, while providing...
Clustered microarchitectures are an effective approach to reducing the penalties caused by wire delays inside a chip. Current superscalar processors have in fact a two-cluster mic...
Ramon Canal, Joan-Manuel Parcerisa, Antonio Gonz&a...
The paper presents PowerMANNA - a distributed-memory parallel computer system based on the 64-Bit PowerPC processor MPC620. The PowerMANNA node architecture supports all the sophi...
With increasing chip densities, future microprocessor designs have the opportunity to integrate many of the traditional systemlevel modules onto the same chip as the processor. So...
This paper explores area/parallelism tradeo s in the design of distributed shared-memory (DSM) multiprocessors built out of large single-chip computing nodes. In this context, are...