Sciweavers

MICRO
2010
IEEE

Throughput-Effective On-Chip Networks for Manycore Accelerators

13 years 2 months ago
Throughput-Effective On-Chip Networks for Manycore Accelerators
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network design. This paper explores throughput-effective network-on-chips (NoC) for future manycore accelerators that employ bulk-synchronous parallel (BSP) programming models such as CUDA and OpenCL. A hardware optimization is "throughput-effective" if it improves parallel application level performance per unit chip area. We evaluate performance of future looking workloads using detailed closed-loop simulations modeling compute nodes, NoC and the DRAM memory system. We start from a mesh design with bisection bandwidth balanced with off-chip demand. Accelerator workloads tend to demand high off-chip memory bandwidth which results in a manyto-few traffic pattern when coupled with expected technology constraints of slow growth in pins-per-chip. Leveraging these observations we reduce NoC area by proposing a &quo...
Ali Bakhoda, John Kim, Tor M. Aamodt
Added 14 Feb 2011
Updated 14 Feb 2011
Type Journal
Year 2010
Where MICRO
Authors Ali Bakhoda, John Kim, Tor M. Aamodt
Comments (0)