This paper presents the design and analysis of parallel approximation algorithms for facility-location problems, including NC and RNC algorithms for (metric) facility location, k-...
In this paper, we present a new hybrid branch predictor called the GoStay2, which can effectively reduce indirect misprediction rates. The GoStay2 has two different mechanisms comp...
Application performance tuning is a complex process that requires assembling various types of information and correlating it with source code to pinpoint the causes of performance...
John M. Mellor-Crummey, Robert J. Fowler, David B....
Abstract—Sharing patterns in shared-memory multiprocessors are the key to performance: uniprocessor latencytolerating techniques such as out-of-order execution and non-blocking c...
A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group of lines, a neighborhood, surrounding the demand-fetched line. The neighborhood ...