In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
In this paper, we propose a fast image deconvolution algorithm that combines adaptive block thresholding and Vaguelet-Wavelet Decomposition. The approach consists in first denoisi...
Abstract--One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cacheaware (CA) and cache-oblivious (CO) algorithms take into...
The backends of today’s Internet services rely heavily on caching at various layers both to provide faster service to common requests and to reduce load on back-end components. ...
Alexander Rasmussen, Emre Kiciman, V. Benjamin Liv...
Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-mem...
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay ...