Performance analysis of high performance systems is a difficult task. Current tools have proven successful in analysis tasks but their implementation is limited in several respects...
Speculative thread-level parallelism has been recently proposed as an alternative source of parallelism that can boost the performance for applications where independent threads a...
Although multicomputers are becoming feasible for solving large problems, they are difficult to program: Extraction of parallelism from scalar languages is possible, but limited....
In this study, we introduce two improved assessment metrics of multiobjective optimizers, Nondominated Ratio and Spacing Distribution, and analyze their rationality and validity. ...
Maoguo Gong, Licheng Jiao, Haifeng Du, Ronghua Sha...
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addi...
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, ...