—We present an application programming interface (API) for the C programming language that facilitates the development of dense linear algebra algorithms on graphics processors a...
The scalable parallel implementation, targeting SMP and/or multicore architectures, of dense linear algebra libraries is analyzed. Using the LU factorization as a case study, it is...
This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a tas...
This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her co...
The multiplication of the sparse multivariate polynomials using the recursive representations is revisited to take advantage on the multicore processors. We take care of the memor...