Recently, energy has become an important issue in highperformance computing. For example, supercomputers that have energy in mind, such as BlueGene/L, have been built; the idea is...
Vincent W. Freeh, Feng Pan, Nandini Kappiah, David...
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Execution and communication traces are central to performance modeling and analysis. Since the traces can be very long, meaningful compression and extraction of representative beha...
Multipartitioning is a skewed-cyclic block distribution that yields better parallel efficiency and scalability for line-sweep computations than traditional block partitionings. Th...
Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end compu...
Aniruddha G. Shet, P. Sadayappan, David E. Bernhol...