In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution o...
This paper presents a mathematical framework to exploit the semantic properties of matrix operations in loop-based numerical codes. The heart of this framework is an algebraic lan...
Targeted optimization of program segments can provide an additional program speedup over the highest default optimization level, such as -O3 in GCC. The key challenge is how to au...
Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Ying...
We further increase the efficiency of Java RMI programs. Where other optimizing re-implementations of RMI use pre-processors to create stubs and skeletons and to create class spe...
The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of...