With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial...
The advent of multicores presents a promising opportunity for exploiting fine grained parallelism present in programs. Programs parallelized in the above fashion, typically involv...
A large and increasing gap exists between processor and memory speeds in scalable cache-coherent multiprocessors. To cope with this situation, programmers and compiler writers mus...
This paper presents the first automatic scheme to allocate local (stack) data in recursive functions to scratch-pad memory (SPM) in embedded systems. A scratch-pad is a fast direct...
The effective use of processor caches is crucial to the performance of applications. It has been shown that cache misses are not evenly distributed throughout a program. In applic...