Memory models like SC, TSO, and PC enforce load-load ordering, requiring that loads from any single thread appear to occur in program order to all other threads. Out-of-order execu...
The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for ma...
In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole rout...
Laxmi N. Bhuyan, Hu-Jun Wang, Ravi R. Iyer, Akhile...
In multimedia and other streaming applications a significant portion of energy is spent on data transfers. Exploiting data reuse opportunities in the application, we can reduce th...
Ilya Issenin, Erik Brockmeyer, Miguel Miranda, Nik...
We present a cross-layer customization methodology for latency and bandwidth efficient inter-core communication in embedded multiprocessors. The methodology integrates compiler, o...