We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Contr...
Abstract— The optimal size of a large on-chip cache can be different for different programs: at some point, the reduction of cache misses achieved when increasing cache size hits...
Domingo Benitez, Juan C. Moure, Dolores Rexachs, E...
In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer ...
Abstract—Resizable caches can trade-off capacity for access speed to dynamically match the needs of the workload. In Simultaneous Multi-Threaded (SMT) cores, the caching needs ca...
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each node has a scalar data value, and the goal...