Sciweavers

DIS
2006
Springer

Optimal Bayesian 2D-Discretization for Variable Ranking in Regression

13 years 8 months ago
Optimal Bayesian 2D-Discretization for Variable Ranking in Regression
In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non parametric discretization method for the classification case. For that, we introduce a family of discretization grid models and a prior distribution defined on this model space. For this prior, we then derive the exact Bayesian model selection criterion. The obtained most probable grid-partition of the data emphasizes the relation (or the absence of relation) between inputs and output and provides a ranking criterion for the input variables. Preliminary experiments both on synthetic and real data demonstrate the criterion capacity to select the most relevant variables and to improve a regression tree...
Marc Boullé, Carine Hue
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where DIS
Authors Marc Boullé, Carine Hue
Comments (0)