Classic loop unrolling allows to increase the performance of sequential loops by reducing the overheads of the non-computational parts of the loop. Unfortunately, when the loop con...
Roger Ferrer, Alejandro Duran, Xavier Martorell, E...
Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel lan...
Bradford L. Chamberlain, Steven J. Deitz, Lawrence...
There is growing interest in run-time detection as parallel and distributed systems grow larger and more complex. This work targets run-time analysis of complex, interactive scien...
Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in rem...
The NPAC kernel runtime, developed in the PCRC Parallel Compiler Runtime Consortium project, is a runtime library with special support for the High Performance Fortran data model....
Bryan Carpenter, Geoffrey Fox, Donald Leskiw, Xiao...