— Architectural resources and program recurrences are the main limitations to the amount of Instruction-Level Parallelism (ILP) exploitable from loops, the most time-consuming pa...
In this paper, we present several novel strategies to improve software controlled cache utilization, so as to achieve lower power requirements for multi-media and signal processin...
Interactive direct visualization of 3D data requires fast update rates and the ability to extract regions of interest from the surrounding data. We have implemented MultiValued Cl...
Terry S. Yoo, Ulrich Neumann, Henry Fuchs, Stephen...
Multi-threaded commercial workloads implement many important internet services. Consequently, these workloads are increasingly used to evaluate the performance of uniprocessor and...
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a...