Dimensionality Reduction via Sparse Support Vector Machines

Jinbo Bi (1), Kristin P. Bennett (1), Mark Embrechts (2), Curt Breneman (3)
(1) Department of Mathematical Sciences
(2) Department of Decision Science and Engineering Systems
(3) Department of Chemistry
Rensselaer Polytechnic Institute
110 8th Street
Troy, NY 12180
bij2@rpi.edu, bennek@rpi.edu,
embrem@rpi.edu, brenec@rpi.edu}

We describe a methodology for performing variable selection and ranking using support vector machines (SVM). The basic idea of the method is very simple. Construct a series of sparse linear SVM that exhibit good generalization. Construct the subset of variables having nonzero weights in the linear models. Then use this subset of variables in nonlinear SVM to produce the final regression or classification function. The method exploits the fact that linear SVM with 1-norm regularization (no kernels) inherently performs variable selection as a side-effect of minimizing capacity in the SVM model. In linear 1-norm SVM, the optimal weight vector will have relatively few nonzero weights with the degree of sparsity depending on the SVM model parameters. The variables with nonzero weights then become potential attributes to be used in the nonlinear SVM. In some sense, we trade the variable selection problem for the model parameter selection problem in SVM.

Our methodology has proven to be very effective on regression problems in drug design. The number of varibles are dramatically reduced. In cross-validation testing, the method outperforms SVM models trained using all the attributes. Also, chemist have found the visualization of the weight sensitivities to be useful. We are testing the approach on classification tasks. We hope to test the method on the workshop datasets as well.