Characterizing Uncertain Data using Compression

7 years 9 months ago
Characterizing Uncertain Data using Compression
Motivated by sensor networks, mobility data, biology and life sciences, the area of mining uncertain data has recently received a great deal of attention. While various papers have focused on efficiently mining frequent patterns from uncertain data, the problem of discovering a small set of interesting patterns that provide an accurate and condensed description of a probabilistic database is still unexplored. In this paper we study the problem of discovering characteristic patterns in uncertain data through information theoretic lenses. Adopting the possible worlds interpretation of probabilistic data and a compression scheme based on the MDL principle, we formalize the problem of mining patterns that compress the database well in expectation. Despite its huge search space, we show that this problem can be accurately approximated. In particular, we devise a sequence of three methods where each new method improves the memory requirements orders of magnitudes compared to its predecessor...
Francesco Bonchi, Matthijs van Leeuwen, Antti Ukko
Added 17 Sep 2011
Updated 17 Sep 2011
Type Journal
Year 2011
Where SDM
Authors Francesco Bonchi, Matthijs van Leeuwen, Antti Ukkonen
Comments (0)