In response to the growing gap between memory access time and processor speed, DRAM manufacturers have created several new DRAM architectures. This paper presents a simulation-bas...
Vinodh Cuppu, Bruce L. Jacob, Brian Davis, Trevor ...
Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of ...
In many complex machine learning applications there is a need to learn multiple interdependent output variables, where knowledge of these interdependencies can be exploited to impr...
Ease of deployment, wireless connectivity and ubiquitous mobile on-the-go computing has made the IEEE 802.11 the most widely deployed Wireless Local Area Network (WLAN) sta...
We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operation...
Andreas Moshovos, Dionisios N. Pnevmatikatos, Amir...