Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performanc...
The execution of an application on a high performance system requires parameters concerning the problem in hand, and those that determine the system mapping, to be specified by a ...
Darren J. Kerbyson, Efstathios Papaefstathiou, Gra...
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly progr...
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarj...
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...
The Multi-Threshold CMOS (MTCMOS) technology provides a solution to the high performance and low power design requirements of modern designs. While the low Vth transistors are use...
Hyo-Sig Won, Kyo-Sun Kim, Kwang-Ok Jeong, Ki-Tae P...