The size and speed of first-level caches and SRAMs of embedded processors continue to increase in response to demands for higher performance. In power-sensitive devices like PDAs a...
Abstract. Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memor...
Stephen Somogyi, Thomas F. Wenisch, Nikolaos Harda...
Abstract. In this paper, we propose a processor architecture with programmable on-chip memory for a high-performance SMP (symmetric multi-processor) node named SCIMA-SMP (Software ...
The Opie Project aims to develop a compiler to transform C codes written for row-major matrix representation into equivalent codes for Morton-order matrix representation, and to a...
Abstract. The large and growing impact of memory hierarchies on overall system performance compels designers to investigate innovative techniques to improve memory-system efficienc...