A scalable parallel algorithm has been designed to study long-time dynamics of many-atom systems based on the nudged elastic band method, which performs mutually constrained molec...
We present new communication-efficient parallel dense linear solvers: a solver for triangular linear systems with multiple right-hand sides and an LU factorization algorithm. Thes...
With faster graphics hardware comes the possibility to realize even more complicated applications that require more detailed data and provide better presentation. The processors ke...
We present a high performance algorithm for multiplying sparse distributed polynomials using a multicore processor. Each core uses a heap of pointers to multiply parts of the poly...
— Fast Fourier transform (FFT) algorithms are used in a wide variety of digital signal processing applications and many of these require high-performance parallel implementations...