In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
As multicore architectures gain widespread use, it becomes increasingly important to be able to harness their additional processing power to achieve higher performance. However, e...
David Zhang, Qiuyuan J. Li, Rodric Rabbah, Saman A...
Apply is a Domain-Specific Language for image processing and low-level computer vision. Apply allows programmers to write kernel operations that focus on the computation for a sin...
Loop unrolling is a well-known compiler optimization that can lead to significant performance improvements. When used in High Level Synthesis (HLS) unrolling can affect the contr...
As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attrac...
Michael I. Gordon, William Thies, Saman P. Amarasi...