This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
Abstract. We describe compiler and run-time optimisations for effective autoparallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by anno...
Message passing via MPI is widely used in singleprogram, multiple-data (SPMD) parallel programs. Existing data-flow frameworks do not model the semantics of message-passing SPMD ...
Michelle Mills Strout, Barbara Kreaseck, Paul D. H...
As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatible mix of low level programming models (e.g. OpenMP, MPI, CUDA, ...
HiPerSAT, a C++ library and tools, processes EEG data sets with ICA (Independent Component Analysis) methods. HiPerSAT uses BLAS, LAPACK, MPI and OpenMP to achieve a high performa...
D. B. Keith, C. C. Hoge, Robert M. Frank, Allen D....