Processor architectures with tens to hundreds of arithmetic units are emerging to handle media processing applications. These applications, such as image coding, image synthesis, ...
Scott Rixner, William J. Dally, Brucek Khailany, P...
We consider distributed applications that continuously stream data across the network, where data needs to be aggregated and processed to produce a 'useful' stream of up...
Vibhore Kumar, Brian F. Cooper, Zhongtang Cai, Gre...
A key characteristic of today’s high performance computing systems is a physically distributed memory, which makes the efficient management of locality essential for taking adv...
This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that e...
Roxana Diaconescu, Lei Wang, Zachary Mouri, Matt C...
Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-perfo...