Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...
The performance of SIMD processors is often limited by the time it takes to transfer data between the centralized control unit and the parallel processor array. This is especially...
Cyclops is a new architecture for high performance parallel computers being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip...
Search-based graph queries, such as finding short paths and isomorphic subgraphs, are dominated by memory latency. If input graphs can be partitioned appropriately, large cluster...
Jonathan W. Berry, Bruce Hendrickson, Simon Kahan,...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource contention and exploiting memory-level parallelism in Simultaneous Multithreaded...