Modern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the h...
Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Li...
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity ON , where 2 3. We show that such an algorithm can be parallelize...
—As parallel file systems span larger and larger numbers of nodes in order to provide the performance and scalability necessary for modern cluster applications, the need for fau...
Advances in multiprocessor interconnect technologyare leading to high performance networks. However, software overheadsassociated with message passing are limiting the processors ...
Debashis Basak, Dhabaleswar K. Panda, Mohammad Ban...
networking with a layer 2 abstraction provides a powerful model for virtualized wide-area distributed computing resources, including for high performance computing (HPC) on collec...
Lei Xia, Zheng Cui, John R. Lange, Yuan Tang, Pete...