Scalable training of L1-regularized log-linear models

14 years 5 months ago

Download www.machinelearning.org

The l-bfgs limited-memory quasi-Newton method is the algorithm of choice for optimizing the parameters of large-scale log-linear models with L2 regularization, but it cannot be used for an L1-regularized loss due to its non-differentiability whenever some parameter is zero. Efficient algorithms have been proposed for this task, but they are impractical when the number of parameters is very large. We present an algorithm OrthantWise Limited-memory Quasi-Newton (owlqn), based on l-bfgs, that can efficiently optimize the L1-regularized log-likelihood of log-linear models with millions of parameters. In our experiments on a parse reranking task, our algorithm was several orders of magnitude faster than an alternative algorithm, and substantially faster than lbfgs on the analogous L2-regularized problem. We also present a proof that owl-qn is guaranteed to converge to a globally optimal parameter vector.

Galen Andrew, Jianfeng Gao

Real-time Traffic

Alternative Algorithm | ICML 2007 | Machine Learning | Optimal Parameter Vector | Parse Reranking Task |

claim paper

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2007
Where	ICML
Authors	Galen Andrew, Jianfeng Gao

Sciweavers

Scalable training of L1-regularized log-linear models

Alternative Algorithm | ICML 2007 | Machine Learning | Optimal Parameter Vector | Parse Reranking Task |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers