##
Lasso-type estimators for variable selection

Yves Grandvalet, Heudiasyc, UMR CNRS 6599,

Yves.Grandvalet@utc.fr
Stéphane Canu, PSI, INSA de Rouen,

stephane.canu@insa-rouen.fr

In generalized linear models, variable
selection can be achieved by shrinkage operators such as the lasso (Least
Absolute Shrinkage and Selection Operator). This estimate has two main
advantages over subset selection models: first, it can be computed by standard
continuous optimization procedures; second, the estimate varies smoothly
with the learning set and with the hyper-parameter setting. As a result,
the method is stable with respect to slight changes in data and with respect
to errors in the hyper-parameter tuning. In a previous study, we showed
that the lasso is equivalent to an adaptive ridge regression estimate which
can be motivated as an attempt to determine the best quadratic penalization
for the problem at hand. A different form of this result may be presented
from the viewpoint of variable selection: lasso can be interpreted as the
usual ridge estimate applied on an ``optimal'' transformation of the input
variables. This viewpoint suggests several generalizations of the lasso-type
estimator for performing variable selection with more complex models such
as additive models, neural networks or kernel machines.