Abstract--One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cacheaware (CA) and cache-oblivious (CO) algorithms take into...
Strassen’s matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen’s reduces the total number of operations. Strasse...
There is increasing interest in using general-purpose operating systems, such as Linux, on embedded platforms. It is especially important in embedded systems to use memory effici...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due pr...
Shoaib Kamil, Kaushik Datta, Samuel Williams, Leon...
The LAPACK software project currently under development is intended to provide a portable linear algebra library for high performance computers. LAPACK will make use of the Level 1...