Memory-intensive threads can hoard shared resources without making progress on a multithreading processor (SMT), thereby hindering the overall system performance. A recent promisi...
In contemporary out-of-order superscalar design, high IPC is mainly achieved by exposing high instruction level parallelism (ILP). Scaling issue window size can certainly provide ...
In this paper we present four parallel algorithms to compute any group of eigenvalues and eigenvectors of a Toeplitz-plus-Hankel matrix. These algorithms parallelize a method that...
Image processing applications tend to access their data non-sequentially and reuse that data infrequently. As a result, they tend to perform poorly on conventional memory systems ...
Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sall...
As we move to large manycores, the hardware-based global checkpointing schemes that have been proposed for small shared-memory machines do not scale. Scalability barriers include ...