Kernel summations are a ubiquitous key computational bottleneck in many data analysis methods. In this paper, we attempt to marry, for the first time, the best relevant technique...
Dongryeol Lee, Richard W. Vuduc, Alexander G. Gray
This paper presents the design and implementation of XenSocket, a UNIX-domain-socket-like construct for high-throughput interdomain (VM-to-VM) communication on the same system. The...
We consider the multiplication of a sparse N × N matrix A with a dense N × N matrix B in the I/O model. We determine the worst-case non-uniform complexity of this task up to a c...
Although unstructured mesh algorithms are a popular means of solving problems across a broad range of disciplines—from texture mapping to computational fluid dynamics—they ar...
Brian S. White, Sally A. McKee, Bronis R. de Supin...
As the frequency gap between main memory and modern microprocessor grows, the implementation and efficiency of on-chip caches become more important. The growing latency to memory ...
Ryan Rakvic, Bryan Black, Deepak Limaye, John Paul...