The effective use of processor caches is crucial to the performance of applications. It has been shown that cache misses are not evenly distributed throughout a program. In applic...
Abstract - A multiprocessor system capable of exploiting fine-grained parallelism must support efficient synchronization and data passing mechanisms. This paper demonstrates the us...
Processor cycle time continues to decrease faster than main memory access times, placing higher demands on cache memory hierarchy performance. To meet these demands, conventional ...
Alvin R. Lebeck, David R. Raymond, Chia-Lin Yang, ...
Dataflow programming has proven to be popular for representing applications in rapid prototyping tools for digital signal processing (DSP); however, existing dataflow design tools...
This paper describes several methods for improving the scalability of memory disambiguation hardware for future high ILP processors. As the number of in-flight instructions grows...