Finding low-entropy sets and trees from binary data

16 years 1 months ago

Download eprints.pascal-network.org

The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards potentially interesting subsets of attributes that have some other type of dependency structure. We consider the problem of finding all subsets of attributes that have low complexity. The complexity is measured by either the entropy of the projection of the data on the subset, or the entropy of the data for the subset when modeled using a Bayesian tree, with downward or upward pointing edges. We show that the entropy measure on sets has a monotonicity property, and thus a levelwise approach can find all low-entropy itemsets. We also show that the treebased measures are bounded above by the entropy of the corresponding itemset, allowing similar algorithms to be used for finding low-entropy trees...

Eino Hinkkanen, Hannes Heikinheimo, Heikki Mannila

Real-time Traffic

Data Mining | Entropy Measure | Interesting Subsets | KDD 2007 | Sparse Binary Data |

claim paper

» Homogeneous String Segmentation using Trees and Weighted Independent Sets

» On Efficient Construction of Decision Trees from Large Databases

» Finding an optimal tree searching strategy in linear time

» Human motion database with a binary tree and node transition graphs

» An On log n Algorithm for the Maximum Agreement Subtree Problem for Binary Trees

» Approximating Optimal Binary Decision Trees

» PhyloMap an algorithm for visualizing relationships of large sequence data sets and its ap...

» Nearest Neighbor Search Using Additive Binary Tree

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2007
Where	KDD
Authors	Eino Hinkkanen, Hannes Heikinheimo, Heikki Mannila, Jouni K. Seppänen, Taneli Mielikäinen

Comments (0)

Sciweavers

Finding low-entropy sets and trees from binary data

Data Mining | Entropy Measure | Interesting Subsets | KDD 2007 | Sparse Binary Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers