High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
Tw o parallel programming models represented b y OpenMP and MPI are compared for PDE solvers based on regular sparse numerical operators. As a typical representative of such an app...
Abstract. The performance of shared-memory (OpenMP) implementations of three different PDE solver kernels representing finite difference methods, finite volume methods, and spectra...
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
We present a system for describing and solving closed queuing network models of the memory access performance of NUMA architectures. The system consists of a model description lan...