Efficient performance tuning of parallel programs is often hard. In this paper we describe an approach that uses a uni-processor execution of a multithreaded program as reference ...
Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient run-time system for executing general as...
The performance attained by parallel programs executed on multiprocessor systems is largely in uenced both by the characteristics of the code and by those of the system architectu...
This paper presents the design and implementation of the MPI-IO interface for the Clusterfile parallel file system. The approach offers the opportunity of achieving a high corelat...
Assembly lines with closed loop parallel lanes have the potential to continue to be productive when individual stations breakdown. A requirement in such parallel lane systems is t...