We build an analytical model for an application utilizing master-slave paradigm. In the model, only three architecture parameters are used: latency, bandwidth and flop rate. Instea...
Applications must scale well to make efficient use of even medium-scale parallel systems. Because scaling problems are often difficult to diagnose, there is a critical need for sc...
Nathan R. Tallent, John M. Mellor-Crummey, Michael...
Solving large, irregular graph problems efficiently is challenging. Current software systems and commodity multiprocessors do not support fine-grained, irregular parallelism wel...
Guojing Cong, Sreedhar B. Kodali, Sriram Krishnamo...
This paper describes a source to source compilation tool for optimizing MPI-based parallel applications. This tool is able to automatically apply a “prepushing” transformation...
Loop distribution is one of the most useful techniques to reduce the execution time of parallel applications. Traditionally, loop scheduling algorithms are implemented based on pa...