The Noise Component in Model-based Cluster Analysis

8 years 11 months ago
The Noise Component in Model-based Cluster Analysis
The so-called noise-component has been introduced by Banfield and Raftery (1993) to improve the robustness of cluster analysis based on the normal mixture model. The idea is to add a uniform distribution over the convex hull of the data as an additional mixture component. While this yields good results in many practical applications, there are some problems with the original proposal: 1) As shown by Hennig (2004), the method is not breakdown-robust. 2) The original approach doesn’t define a proper ML estimator, and doesn’t have satisfactory asymptotic properties. We discuss two alternatives. The first one consists of replacing the uniform distribution by a fixed constant, modelling an improper uniform distribution that doesn’t depend on the data. This can be proven to be more robust, though the choice of the involved tuning constant is tricky. The second alternative is to approximate the ML-estimator of a mixture of normals with a uniform distribution more precisely than it i...
Christian Hennig, Pietro Coretto
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where GFKL
Authors Christian Hennig, Pietro Coretto
Comments (0)