Sciweavers

DMIN
2007

Towards Average Case Analysis of Itemset Mining

13 years 6 months ago
Towards Average Case Analysis of Itemset Mining
—We perform a statistical analysis and describe the asymptotic behavior of the frequency and size distribution of δoccurrent, minimal δ-occurrent, and maximal δ-occurrent itemsets occurring in random datasets across the entire spectrum of δ. We also describe the probability distribution of the support of an n-element itemset in a random dataset. We find that for small values of δ relative to number of transactions the size distribution of δ-occurrent itemsets and maximal δ-occurrent itemsets can be approximated by the binomial distributions b(L, 1 1+2δ ) and b(L, 1 2δ ), respectively, where L is inventory size. The ratio of minimal δ-occurrent and maximal δ-occurrent itemsets to the total number of δ-occurrent itemsets is low for small values of δ and rapidly approaches 1 as δ approaches the number of transactions. We also prove that the probability distribution of the support of an n-element itemset in a random k-transaction dataset is binomial of type b(k, 1 2n ).
Dan Singer, David J. Haglin, Anna M. Manning
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where DMIN
Authors Dan Singer, David J. Haglin, Anna M. Manning
Comments (0)