Shared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-pas...
A wide class of geometry processing and PDE resolution methods needs to solve a linear system, where the non-zero pattern of the matrix is dictated by the connectivity matrix of th...
Deeply pipelined high performance processors require highly accurate branch prediction to drive their instruction fetch. However there remains a class of events which are not easi...
Abstract. Performance of the on-chip cache is critical for processor. The multithread program model usually employed by on-chip many-core architectures may have effects on cache ac...
Basic data flow patterns which we call idioms, such as stream, transpose, reduction, random access and stencil, are common in scientific numerical applications. We hypothesize tha...
Jiahua He, Allan Snavely, Rob F. Van der Wijngaart...