As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been...
Brad Calder, Chandra Krintz, Simmi John, Todd M. A...
In conventional superscalar microarchitectures with partitioned integer and floating-point resources, all floating-point resources are idle during execution of integer programs....
S. Subramanya Sastry, Subbarao Palacharla, James E...
Several researchers have proposed algorithms for basic block reordering. We call these branch alignment algorithms. The primary emphasis of these algorithms has been on improving ...
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
This paper presents a new unified design flow developed within the Perplexus project that aims to accelerate parallelizable data-intensive applications in the context of ubiquitous...