Sciweavers

FAST
2011

Just-in-Time Analytics on Large File Systems

12 years 7 months ago
Just-in-Time Analytics on Large File Systems
As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring highlevel analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing pre-processing based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. In this paper, we advocate that user interests can often be sufficiently satisfied by approximate i.e., statistically accurate - answers. We develop Glance, a just-in-time sampling-based system which, after consuming a small number of disk accesses, is capable of producing extremely accurate answers for a broad class of aggregate and top-k queries over a file syste...
H. Howie Huang, Nan Zhang 0004, Wei Wang, Gautam D
Added 28 Aug 2011
Updated 28 Aug 2011
Type Journal
Year 2011
Where FAST
Authors H. Howie Huang, Nan Zhang 0004, Wei Wang, Gautam Das, Alexander S. Szalay
Comments (0)