Summarising Data by Clustering Items

15 years 3 months ago

Download win.ua.ac.be

Abstract. For a book, the title and abstract provide a good ﬁrst impression of what to expect from it. For a database, getting a ﬁrst impression is not so straightforward. While low-order statistics only provide limited insight, mining the data quickly provides too much detail. In this paper we propose a middle ground, and introduce a parameter-free method for constructing high-quality summaries for binary data. Our method builds a summary by grouping items that strongly correlate, and uses the Minimum Description Length principle to identify the best grouping —without requiring a distance measure between items. Besides oﬀering a practical overview of which attributes interact most strongly, these summaries are also easily-queried surrogates for the data. Experiments show that our method discovers high-quality results: correlated attributes are correctly grouped and the supports of frequent itemsets are closely approximated.

Michael Mampaey, Jilles Vreeken

Real-time Traffic