Accurate, reproducible and comparable measurement of collective operations is a complicated task. Although Different measurement schemes are implemented in wellknown benchmarks, m...
In a shared-memory multiprocessor system, it is possible that certain synchronization operations are redundant -that is, their corresponding sequencing requirements are enforced c...
Shuvra S. Bhattacharyya, Sundararajan Sriram, Edwa...
This paper presents a case study about the applicability and usage of non blocking collective operations. These operations provide the ability to overlap communication with computa...
Torsten Hoefler, Peter Gottschling, Andrew Lumsdai...
—Atomic operations are important building blocks in supporting general-purpose computing on graphics processing units (GPUs). For instance, they can be used to coordinate executi...
As shared-memory multiprocessors become the dominant commodity source of computation, parallelizing compilers must support mainstream computations that manipulate irregular, point...