The performance of the MPI’s collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give...
Sathish S. Vadhiyar, Graham E. Fagg, Jack Dongarra
Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due...
Mohammad Banikazemi, Vijay Moorthy, Dhabaleswar K....
The increasing demand for computational cycles is being met by the use of multi-core processors. Having large number of cores per node necessitates multi-core aware designs to ext...
Krishna Chaitanya Kandalla, Hari Subramoni, Gopala...
We introduce LogGOPSim--a fast simulation framework for parallel algorithms at large-scale. LogGOPSim utilizes a slightly extended version of the well-known LogGPS model in combin...
TACO (Topologies and Collections) is a template library that introduces the flavour of distributed data parallel processing by means of reusable topology classes and C++ s. This p...