Abstract. Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of l...
This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical archit...
Walden Ko, Mark N. Yankelevsky, Dimitrios S. Nikol...
Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedde...
The excessive complexity of both machine architectures and applications have made it difficult for compilers to statically model and predict application behavior. This observatio...
Qing Yi, Keith Seymour, Haihang You, Richard W. Vu...
Modern microprocessor designs continue to obtain impressive performance gains through increasing clock rates and advances in the parallelism obtained via micro-architecture design...