At the core of contemporary high performance computer systems is the communication infrastructure. For this reason, there has been a lot of work on providing low-latency, high-ban...
Sven Karlsson, Stavros Passas, George Kotsis, Ange...
This paper describes the implementation and evaluation of the OpenMP compiler designed for the Hitachi SR8000 Super Technical Server. The compiler performs parallelization for the ...
Search-based graph queries, such as finding short paths and isomorphic subgraphs, are dominated by memory latency. If input graphs can be partitioned appropriately, large cluster...
Jonathan W. Berry, Bruce Hendrickson, Simon Kahan,...
Abstract -- Telecommunication network management applications often require application-specific ICs that use large dynamically allocated stored data structures. Currently availab...
Gjalt G. de Jong, Bill Lin, Carl Verdonck, Sven Wu...
Parallel random access memory, or PRAM, is a now venerable model of parallel computation that that still retains its usefulness for the design and analysis of parallel algorithms....