This paper introduces a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. ...
Rodric M. Rabbah, Hariharan Sandanagobalane, Mongk...
This paper presents a parallel algorithm for recursive query processing and shows how it can be efficiently implemented in a local computer network. The algorithm relies on an int...
For functional programs, unboxing aggregate data structures such as tuples removes memory indirections and frees dead components of the decoupled structures. To explore the consequ...
Optimizing array accesses is extremely critical in embedded computing as many embedded applications make use of arrays (in form of images, video frames, etc). Previous research co...
Guilin Chen, Mahmut T. Kandemir, A. Nadgir, Ugur S...
Despite the widespread and growing use of asynchronous copies to improve scalability, performance and availability, this practice still lacks a firm semantic foundation. Applicati...