Memory system bottlenecks limit performance for many applications, and computations with strided access patterns are among the hardest hit. The streams used in such applications h...
Abstract—Recent years have seen a trend in using graphic processing units (GPU) as accelerators for general-purpose computing. The inexpensive, single-chip, massively parallel ar...
This paper focuses on the design of high performance VR applications. These applications usually involve various I/O devices and complex simulations. A parallel architecture or gri...
A class of parallel incomplete factorization preconditionings for the solution of large linear systems is investigated. The approach may be regarded as a generalized domain decompo...
Legacy applications are still widely spread. If a need to change deployment or update its functionality arises, it becomes difficult to estimate the performance impact of such modi...
Steffen Becker, Michael Hauck, Mircea Trifu, Klaus...