Sciweavers

IPPS
2010
IEEE

Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities

13 years 2 months ago
Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities
Abstract--This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand (IB) Host Channel Adapter (HCA) ConnectX-2. These capabilities enable the progression and completion of data-dependent communications sequences to progress and complete at the network level without any Central Processing Unit (CPU) involvement. We use the latency dominated nonblocking barrier algorithm in this study, and find that at 64 process count, a contiguous time slot of about 80 percent of the nonblocking barrier time is available for computation. This time slot increases as the number of processes participating increases. In contrast, CPU based implementations provide a time slot of up to 30 percent of the nonblocking barrier time. This bodes well for the scalability of simulations employing offloaded collective operations. These capabilities can be used to reduce the effects of system noise, and when using nonblo...
Richard L. Graham, Stephen W. Poole, Pavel Shamis,
Added 13 Feb 2011
Updated 13 Feb 2011
Type Journal
Year 2010
Where IPPS
Authors Richard L. Graham, Stephen W. Poole, Pavel Shamis, Gil Bloch, Noam Bloch, Hillel Chapman, Michael Kagan, Ariel Shahar, Ishai Rabinovitz, Gilad Shainer
Comments (0)