High performance computing (HPC) is gaining popularity in solving scientific applications. Using the current programming standards, however, it takes an HPC expert to efficiently ...
The scalable parallel implementation, targeting SMP and/or multicore architectures, of dense linear algebra libraries is analyzed. Using the LU factorization as a case study, it is...
This paper presents a hardware-optimized variant of the well-known Gaussian elimination over GF(2) and its highly efficient implementation. The proposed hardware architecture, we...
We present lazy and forgetful algorithms for adding, multiplying and dividing multivariate polynomials. The lazy property allows us to compute the i-th term of a polynomial withou...