The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This pa...
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their com...
To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing ha...
Eric S. Chung, Peter A. Milder, James C. Hoe, Ken ...
Efficient and robust metacomputing requires the decomposition of complex jobs into tasks that must be scheduled on distributed processing nodes. There are various ways of creating...
In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in h...
Nastaran Baradaran, Jacqueline Chame, Chun Chen, P...