As on-chip integration matures, single-chip system designers must not only be concerned with component-level issues such as performance and power, but also with onchip system-leve...
A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group of lines, a neighborhood, surrounding the demand-fetched line. The neighborhood ...
Modern computers have taken advantage of the instruction-level parallelism (ILP) available in programs with advances in both architecture and compiler design. Unfortunately, large...
The allocation of lock objects to critical sections in concurrent programs affects both performance and correctness. Recent work explores automatic lock allocation, aiming primari...
Richard L. Halpert, Christopher J. F. Pickett, Cla...
This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-cont...