Coarse Grain Reconfigurable Architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and movi...
Yongjoo Kim, Jongeun Lee, Aviral Shrivastava, Yunh...
Abstract. A program analysis tool can play an important role in helping users understand and improve OpenMP codes. Array privatization is one of the most effective ways to improve...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real a...
The goal of cache management is to maximize data reuse. Collaborative caching provides an interface for software to communicate access information to hardware. In theory, it can o...