The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-bas...
For most parallel and high performance systems, tuning guides provide the users with advices to optimize the execution time of their programs. Execution time may be very sensitive...
As multicore architectures gain widespread use, it becomes increasingly important to be able to harness their additional processing power to achieve higher performance. However, e...
David Zhang, Qiuyuan J. Li, Rodric Rabbah, Saman A...
A replay tool aiming to reproduce a program's execution interposes itself at an appropriate replay interface between the program and the environment. During recording, it log...
Ming Wu, Fan Long, Xi Wang, Zhilei Xu, Haoxiang Li...
Abstract. Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the Coupled Cluster method. Transformations using algeb...
Albert Hartono, Alexander Sibiryakov, Marcel Nooij...