This paper presents COBRA (Continuous Binary ReAdaptation), a runtime binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based...
This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical archit...
Walden Ko, Mark N. Yankelevsky, Dimitrios S. Nikol...
Measurement-based profiling introduces intrusion in program execution. Intrusion effects can be mitigated by compensating for measurement overhead. Techniques for compensation anal...
Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end compu...
Aniruddha G. Shet, P. Sadayappan, David E. Bernhol...
MPICH2 provides a layered architecture for implementing MPI-2. In this paper, we provide a new design for implementing MPI-2 over InfiniBand by extending the MPICH2 ADI3 layer. Ou...