The ROMIO implementation of the MPI-IO standard provides a portable infrastructure for use on top of a variety of underlying storage targets. These targets vary widely in their ca...
Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, indexbased matrix representations, which degrade cache utilization by i...
We describe and evaluate a new, pipelined algorithm for large, irregular all-gather problems. In the irregular all-gather problem each process in a set of processes contributes in...
As high-end computing systems continue to grow in scale, recent advances in multiand many-core architectures have pushed such growth toward more denser architectures, that is, mor...
Pavan Balaji, Darius Buntinas, David Goodell, Will...
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on ...
Pavan Balaji, Anthony Chan, William Gropp, Rajeev ...