Recent studies on value locality reveal that many instructions are frequently executed with a small variety of inputs. This paper proposes an approach that integrates architecture...
Scratchpad memory (SPM), a fast software-managed onchip SRAM, is now widely used in modern embedded processors. Compared to hardware-managed cache, it is more efficient in perfor...
Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology...
The advent of networking technologies and high performance transport protocols facilitates the service of storage over networks. However, they pose challenges in integration and i...
Jiesheng Wu, Pete Wyckoff, Dhabaleswar K. Panda, R...
Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored gra...