Sciweavers

IPPS
2007
IEEE

Storage Optimization for Large-Scale Distributed Stream Processing Systems

13 years 10 months ago
Storage Optimization for Large-Scale Distributed Stream Processing Systems
We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, incoming data and intermediate results may need to be stored to enable future analyses. The quantity of such data would dominate even the largest storage system. Thus, a mechanism is needed to keep the most useful data. One recently introduced approach is to employ retention value functions, which effectively assign each data object a value that changes over time [5]. Storage space is then reclaimed automatically by deleting data of lowest current value. In such large systems, there will naturally be multiple file systems available, each with different properties. Choosing the right file system for a given incoming data stream presents a challenge. In this paper we provide a novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions. The goal is to keep the data of hig...
Kirsten Hildrum, Fred Douglis, Joel L. Wolf, Phili
Added 03 Jun 2010
Updated 03 Jun 2010
Type Conference
Year 2007
Where IPPS
Authors Kirsten Hildrum, Fred Douglis, Joel L. Wolf, Philip S. Yu, Lisa Fleischer, Akshay Katta
Comments (0)