Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power ...
Wei-Yu Chen, Dan Bonachea, Costin Iancu, Katherine...
Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed t...
Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hu...
Communication subsystems are used in high-performance parallel computing systems to abstract the lower network layer. By using a communication subsystem, an upper middleware librar...
Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. The PBEAM syst...
Simulating chip-multiprocessor systems (CMP) can take a long time. For single-threaded workloads, earlier work has shown the utility of phase analysis, that is identification of r...
Jeffrey Namkung, Dohyung Kim, Rajesh K. Gupta, Igo...