Characterising the difference

10 years 11 months ago
Characterising the difference
Characterising the differences between two databases is an often occurring problem in Data Mining. Detection of change over time is a prime example, comparing databases from two branches is another one. The key problem is to discover the patterns that describe the difference. Emerging patterns provide only a partial answer to this question. In previous work, we showed that the data distribution can be captured in a pattern-based model using compression [12]. Here, we extend this approach to define a generic dissimilarity measure on databases. Moreover, we show that this approach can identify those patterns that characterise the differences between two distributions. Experimental results show that our method provides a wellfounded way to independently measure database dissimilarity that allows for thorough inspection of the actual differences. This illustrates the use of our approach in real world data mining. Categories and Subject Descriptors H.2.8. Data Mining; I.5.4. Similarity Mea...
Jilles Vreeken, Matthijs van Leeuwen, Arno Siebes
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2007
Where KDD
Authors Jilles Vreeken, Matthijs van Leeuwen, Arno Siebes
Comments (0)