Divide and conquer algorithms are a good match for modern parallel machines: they tend to have large amounts of inherent parallelism and they work well with caches and deep memory...
Multiple memory banks design is employed in many high performance DSP processors. This architectural feature supports higher memory bandwidth by allowing multiple data memory acce...
Chun Jason Xue, Tiantian Liu, Zili Shao, Jingtong ...
Existing dynamic race detectors suffer from at least one of the following three limitations: (i) space overhead per memory location grows linearly with the number of parallel thre...
On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring ...
Induprakas Kodukula, Keshav Pingali, Robert Cox, D...