A Large-Scale Study of File-System Contents

10 years 5 months ago
A Large-Scale Study of File-System Contents
We collect and analyze a snapshot of data from 10,568 file systems of 4801 Windows personal computers in a commercial environment. The file systems contain 140 million files totaling 10.5 TB of data. We develop analytical approximations for distributions of file size, file age, file functional lifetime, directory size, and directory depth, and we compare them to previously derived distributions. We find that file and directory sizes are fairly consistent across file systems, but file lifetimes vary widely and are significantly affected by the job function of the user. Larger files tend to be composed of blocks sized in powers of two, which noticeably affects their size distribution. File-name extensions are strongly correlated with file sizes, and extension popularity varies with user job function. On average, file systems are only half full. Keywords File-system contents, directory hierarchy, static data snapshot, workload characterization, analytical modeling.
John R. Douceur, William J. Bolosky
Added 03 Aug 2010
Updated 03 Aug 2010
Type Conference
Year 1999
Authors John R. Douceur, William J. Bolosky
Comments (0)