There has been intensive research on data prefetching focusing on performance improvement, however, the energy aspect of prefetching is relatively unknown. Our experiments show th...
Yao Guo, Saurabh Chheda, Israel Koren, C. Mani Kri...
This paper describes an algorithm that takes a trace (i.e., a sequence of numbers or vectors of numbers) as input, and from that produces a sequence of loop nests that, when run, ...
In this paper, we propose to implement hybrid operating systems based on two-level hardware interrupts. To separate real-time and non-real-time hardware interrupts by hardware, we...
Abstract. The shared-cache contention on Chip Multiprocessors causes performance degradation to applications and hurts system fairness. Many previously proposed solutions schedule ...
Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...