Dwarf: shrinking the PetaCube

10 years 1 days ago
Dwarf: shrinking the PetaCube
Dwarf is a highly compressed structure for computing, storing, and querying data cubes. Dwarf identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automati...
Yannis Sismanis, Antonios Deligiannakis, Nick Rous
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2002
Authors Yannis Sismanis, Antonios Deligiannakis, Nick Roussopoulos, Yannis Kotidis
Comments (0)