Sciweavers

KDD
2006
ACM

Assessing data mining results via swap randomization

14 years 4 months ago
Assessing data mining results via swap randomization
The problem of assessing the significance of data mining results on high-dimensional 0?1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chi-square tests, or many other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are more difficult to apply to sets of patterns or other complex results of data mining. In this paper, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins with the given dataset, computing the results of interest on the randomized instances, and comparing them against the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, ...
Aristides Gionis, Heikki Mannila, Panayiotis Tsapa
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2006
Where KDD
Authors Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas, Taneli Mielikäinen
Comments (0)