Sciweavers

IJCNN
2000
IEEE

Continuous Optimization of Hyper-Parameters

13 years 8 months ago
Continuous Optimization of Hyper-Parameters
Many machine learning algorithms can be formulated as the minimization of a training criterion which involves (1) \training errors" on each training example and (2) some hyper-parameters, which are kept xed during this minimization. When there is only a single hyper-parameter one can easily explore how its value a ects a model selection criterion (that is not the same as the training criterion, and is used to select hyperparameters). In this paper we present a methodology to select many hyper-parameters that is based on the computation of the gradient of a model selection criterion with respect to the hyper-parameters. We rst consider the case of a training criterion that is quadratic in the parameters. In that case, the gradient of the selection criterion with respect to the hyper-parameters is e ciently computed by back-propagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyper...
Yoshua Bengio
Added 31 Jul 2010
Updated 31 Jul 2010
Type Conference
Year 2000
Where IJCNN
Authors Yoshua Bengio
Comments (0)