Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the h...
Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Li...
— Due to the multi-core processors, the importance of parallel workloads has increased considerably. However, manycore chips demand new interconnection strategies, since traditio...
Henrique Cota de Freitas, Lucas Mello Schnorr, Mar...
We introduce a new performance metric, called Load Balancing Factor (LBF), to assist programmers with evaluating different tuning alternatives. The LBF metric differs from traditi...
Efficient determination of processing termination at barrier synchronization points can occupy an important role in the overall throughput of parallel and distributed computing sy...
This paper presents a new approach for analyzing the performance of grid scheduling algorithms for tasks with dependencies. Finding the optimal procedures for DAG scheduling in Gr...