We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algor...
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
The analysis of the runtime behavior of a software system yields vast amounts of information, making accurate interpretations difficult. Filtering or compression techniques are o...
As the last-level on-chip caches in chip-multiprocessors increase in size, the physical locality of on-chip data becomes important for delivering high performance. The non-uniform...
Large distributed systems such as Computational Grids require a large amount of monitoring data be collected for a variety of tasks such as fault detection, performance analysis, ...
Brian Tierney, Brian Crowley, Dan Gunter, Mason Ho...