Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune ...
Todd Gamblin, Bronis R. de Supinski, Martin Schulz...
We apply a scalable approach for practical, comprehensive design space evaluation and optimization. This approach combines design space sampling and statistical inference to ident...
In this paper, we describe a set of compiler analyses and an implementation that automatically map a sequential and un-annotated C program into a pipelined implementation, targete...
Basic data flow patterns which we call idioms, such as stream, transpose, reduction, random access and stencil, are common in scientific numerical applications. We hypothesize tha...
Jiahua He, Allan Snavely, Rob F. Van der Wijngaart...
In this paper we present a paradigm to express streams and its implementation. Streams are a convenient mechanism to communicate multimedia data, for example video or audio, betwe...