Aggressive hardware-based and software-based prefetch algorithms for hiding memory access latencies were proposed to bridge the gap of the expanding speed disparity between proces...
This study deals with the Delay-Constrained Minimum Cost Loop Problem (DC-MCLP) of finding several loops from a source node. The DC-MCLP consists of finding a set of minimum cost ...
Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism ar...
We characterize the performance and power attributes of the conjugate gradient (CG) sparse solver which is widely used in scientific applications. We use cycle-accurate simulatio...
Konrad Malkowski, Ingyu Lee, Padma Raghavan, Mary ...
This paper is about a new framework for high performance thread scheduling based on the work stealing principle when processors may run at different speed. We also take into accou...