As multiprocessor sizes scale and computer architects turn to interconnection networks with non-uniform communication latencies, the lure of exploiting communication locality to i...
On SMP clusters, mixed mode collective MPI communications, which use shared memory communications within SMP nodes and point-to-point communications between SMP nodes, are more eļ...
Meng-Shiou Wu, Ricky A. Kendall, Kyle Wright, Zhao...
āApplication of hardware-parameterized models to distributed systems can result in omission of key bottlenecks such as the full cost of inter- and intra-node communication in a c...
Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machine-speci c hand tuning. We have developed a methodology wh...
Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, James...