An Empirical Comparison of NML Clustering Algorithms

13 years 1 months ago
An Empirical Comparison of NML Clustering Algorithms
Clustering can be defined as a data assignment problem where the goal is to partition the data into nonhierarchical groups of items. In our previous work, we suggested an information-theoretic criterion, based on the minimum description length (MDL) principle, for defining the goodness of a clustering of data. The basic idea behind this framework is to optimize the total code length over the data by encoding together data items belonging to the same cluster. In this setting efficient coding is possible only by exploiting underlying regularities that are common to the members of a cluster, which means that this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood (NML) code which has been shown to produce optimal code lengths in the worst case sense. In this paper, we focus on the optimization aspect of the clusterin...
Petri Kontkanen, Petri Myllymäki
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ITSL
Authors Petri Kontkanen, Petri Myllymäki
Comments (0)