Rank-varying computational complexity describes those computations in which the complexity of executing each step is not a constant, but evolves throughout the computation as a fu...
Abstract. Loop fusion is a program transformation that merges multiple loops into one. It is e ective for reducing the synchronization overhead of parallel loops and for improving ...
The fact that graphics processors (GPUs) are today’s most powerful computational hardware for the dollar has motivated researchers to utilize the ubiquitous and powerful GPUs fo...
This paper? provides a description of our Network Time Interface M-Module NTI supporting high-accuracy external clock synchronization by hardware. The NTI is built around our custo...
This paper discusses our experience with fine-grain synchronization for a variant of the preconditioned conjugate gradient method. This algorithm represents a large class of algo...