Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-...
This paper presents a performance study of a nonrigid registration algorithm for investigating lung disease on clusters. Our algorithm combines two conventional acceleration techn...
This paper presents a parallel adaptive version of the block-based Gauss-Jordan algorithm used in numerical analysis to invert matrices. This version includes a characterization o...
The Cray MTA-2 system provides exceptional performance on a variety of sparse graph algorithms. Unfortunately, it was an extremely expensive platform. Cray is preparing an Eldorad...
Keith D. Underwood, Megan Vance, Jonathan W. Berry...
This paper presents a self-adapting parallel package for computing the Walsh-Hadamard transform (WHT), a prototypical fast signal transform, similar to the fast Fourier transform....