Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, pr...
Background: Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or ‘workflow’, is a well-defined protocol, with a specific structure defin...
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
Abstract—Dynamic runtimes can simplify parallel programming by automatically managing concurrency and locality without further burdening the programmer. Nevertheless, implementin...
Richard M. Yoo, Anthony Romano, Christos Kozyrakis
We describe the use of component architecture in an area to which this approach has not been classically applied, the area of cluster system software. By "cluster system soft...
Narayan Desai, Rick Bradshaw, Ewing L. Lusk, Ralf ...