Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

9 years 9 months ago

Download crd.lbl.gov

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors’ greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether diﬀerent approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run eﬃciently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments. In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was suﬃcient in attaining high performance, signiﬁcant ...

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leo

Real-time Traffic

Distributed And Parallel Computing | PPOPP 2015 |

claim paper

Post Info
More Details (n/a)

Added	16 Apr 2016
Updated	16 Apr 2016
Type	Journal
Year	2015
Where	PPOPP
Authors	Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker

Comments (0)

Sciweavers

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

Distributed And Parallel Computing | PPOPP 2015 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers