ITCH: Information-Theoretic Cluster Hierarchies

10 years 11 months ago
ITCH: Information-Theoretic Cluster Hierarchies
Hierarchical clustering methods are widely used in various scientific domains such as molecular biology, medicine, economy, etc. Despite the maturity of the research field of hierarchical clustering, we have identified the following four goals which are not yet fully satisfied by previous methods: First, to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. Second, to represent each cluster in the hierarchy by an intuitive description with e.g. a probability density function. Third, to consistently handle outliers. And finally, to avoid difficult parameter settings. With ITCH, we propose a novel clustering method that is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL), referred to as hMDL. Interpreting the hierarchical cluster structure as a statistical model of the data set, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate indu...
Christian Böhm, Frank Fiedler, Annahita Oswal
Added 29 Jan 2011
Updated 29 Jan 2011
Type Journal
Year 2010
Where PKDD
Authors Christian Böhm, Frank Fiedler, Annahita Oswald, Claudia Plant, Bianca Wackersreuther, Peter Wackersreuther
Comments (0)