Sciweavers

WWW
2004
ACM

The webgraph framework I: compression techniques

14 years 5 months ago
The webgraph framework I: compression techniques
Studying Web graphs is often difficult due to their large size. Recently, several proposals have been published about various techniques that allow to store a Web graph in memory in a limited space, exploiting the inner redundancies of the Web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large Web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other). WebGraph can compress the WebBase graph (118 Mnodes, 1 Glinks) in as little as 3.08 bits per link, and its transposed version in as little as 2.89 bits per link. Categories and Subject Descriptors E.2 [Data]: Data Storage Representations; E.4 [Data]: Coding and Information Theory; H.3 [Information Systems]: Information Storage and Retrieval General Terms Algorithms, Experimentation, Measurement Keywords Web graph, compression
Paolo Boldi, Sebastiano Vigna
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2004
Where WWW
Authors Paolo Boldi, Sebastiano Vigna
Comments (0)