This paper describes an implementation of parallel LU factorization. The focus is to achieve high performance on non-dedicated clusters, where the number of available computing re...
The progression of implementation technologies into the sub-100 nanometer lithographies renew the importance of understanding and protecting against single-event upsets in digital...
Nicholas J. Wang, Justin Quek, Todd M. Rafacz, San...
A recent trend in modern high performance computing (HPC) system architectures employs “lean” compute nodes running a lightweight operating system (OS). Certain parts of the OS...
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
Concurrency fault are difficult to find because they usually occur under specific thread interleavings. Fault-detection tools in this area find data-access patterns among threa...