Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation, in order to select the best-performing solution for...
Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun...
The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tas...
Ying Chen, Frank K. H. A. Dehne, Todd Eavis, Andre...
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...
Anthony Danalis, Lori L. Pollock, D. Martin Swany,...
Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
1- A methodology is presented in this paper for determining an optimal set of clock path delays for designing high performance VLSI/ULSI-based clock distribution networks. This met...