Lasso-type estimators for variable selection

Yves Grandvalet, Heudiasyc, UMR CNRS 6599,
Yves.Grandvalet@utc.fr

Stéphane Canu, PSI, INSA de Rouen,
stephane.canu@insa-rouen.fr

In generalized linear models, variable selection can be achieved by shrinkage operators such as the lasso (Least Absolute Shrinkage and Selection Operator). This estimate has two main advantages over subset selection models: first, it can be computed by standard continuous optimization procedures; second, the estimate varies smoothly with the learning set and with the hyper-parameter setting. As a result, the method is stable with respect to slight changes in data and with respect to errors in the hyper-parameter tuning. In a previous study, we showed that the lasso is equivalent to an adaptive ridge regression estimate which can be motivated as an attempt to determine the best quadratic penalization for the problem at hand. A different form of this result may be presented from the viewpoint of variable selection: lasso can be interpreted as the usual ridge estimate applied on an ``optimal'' transformation of the input variables. This viewpoint suggests several generalizations of the lasso-type estimator for performing variable selection with more complex models such as additive models, neural networks or kernel machines.