NIPS 2001 workshop on Variable and Feature Selection
December 6-8, 2001
Variable selection refers to the problem of selecting input variables that are most predictive of a given outcome. Variable selection problems are found in all machine learning tasks, supervised or unsupervised, classification, regression, time series prediction, two-class or multi-class, posing various levels of challenges. Feature selection refers to the selection of an optimum subset of features derived from these input variables. Thus variable selection is distinct from feature selection if the inducer (classifier, regression machine, etc.) operates in a feature space that is not the input space. However, in the rest of this text we sometimes use interchangeably the terms variable and feature.
In the recent years, variable selection has become the focus of a lot of research in several areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing, particularly in application to Internet documents, and Genomics, particularly gene expression array data. The objective of variable selection is two-fold: improving the prediction performance of the inducers and providing a better understanding of the underlying concept that generated the data.
Variable/feature selection problems are related to
the problems of input dimensionality reduction and of parameter pruning.
All these problems revolve around the capacity control of the inducer and
are instances of the model selection problem. However, variable selection
has practical and theoretical challenges of its own: First, the definition
of the mathematical statement of the problem is not widely agreed upon
and may depend on the application. One typically distinguishes:
From the theory point of view, the model selection problem of feature selection is notoriously hard. Even harder is the simultaneous selection of the features and the learning machine, or, in the case of unsupervised learning, the simultaneous selection of the features and the number of clusters. There is experimental evidence that greedy methods work better than combinatorial search but a learning theoretic analysis of the underlying regularization mechanisms remains to be done. Other theoretical challenges include estimating with what confidence one can state that a feature is relevant to the concept when it is useful to the inducer. Finally rating the variable/feature selection methods also poses challenges.
Because of the hot nature of the topic and the large number of open questions, we anticipate that this workshop will be a forum for active discussions. One of the objectives of the workshop will be to discuss the modalities of a competition on variable selection to be possibly organized for NIPS 2002. We expect that the process of designing the competition will allow us to clarify the mathematical statement of the problem and methods for rating variable/feature selection methods.
Prospective participants are invited to report results of variable selection on data sets suggested and reflect upon the problem of designing a good competition for the problem of variable selection. Code to generate the artificial data set will also be made available for review and suggestions.
For example, a possible competition format for next year would be the following: Datasets would be posted several months before the workshop. The participants would be given a few weeks to run these data sets through their variable selection algorithm. They would then return their selection of variables, in exchange of which they would be given the test set, for these variables only. They would have to return the predicted outputs on the test data. The result submissions would be compared on the basis of three criteria: compactness of the feature set, test performance, and number of submissions per individual/organization.
The workshop is open to anybody who has an interest in variable/feature selection and there are no pre-requites to attend. Familiarity with the basics of the problem can be found in the tutorial "Wrappers for Feature Selection" by Ron Kohavi and George John http://citeseer.nj.nec.com/13663.html.
The workshop is over. Prospective participants were invited to submit one or two pages of summary and report experiments preferably on the proposed data sets. Some slides and papers are available in the schedule. The proceedings will be published as a special issue of the Journal of Machine Learning Research (see the call for paper).
Several real world data sets can be downloaded
from the Stanford Microarray database
Preliminary artificial data generation
program for a linear 2 class classification problem (Isabelle Guyon):
A problem of linear regression by Leo Breiman
The noisy LED display problem
Friday, December 7th:
7:30-8:00: Welcome and introduction to the problem of feature/variable selection - Isabelle Guyon - [ppt slides]
8:00-8:20 a.m. Dimensionality Reduction via Sparse Support Vector Machines - Jinbo Bi, Kristin P. Bennett, Mark Embrechts and, Curt Breneman - [Gunzipped ppt slides]
8:20-8:40 a.m. Feature selection for non-linear SVMs using a gradient descent algorithm - Olivier Chapelle and Jason Weston -
8:40-9:00 a.m. When Rather Than Whether: Developmental Variable Selection - Melissa Dominguez - [slides in ppt]
9:00-9:20 a.m. Pause, free discussions.
9:20-9:40 How to recycle your SVM code to do feature selection - Andre Elisseeff and Jason Weston -
10:00-10:30 a.m. Discussion. What are the various statements of the variable selection problem?
4:00-4:20 p.m. Using DRCA to see the effects of variable combinations on classifiers - Ofer Melnik - [paper in ps or pdf]
4:20-4:40 p.m.Feature selection in the setting of many irrelevant features - Andrew Y. Ng and Michael I. Jordan -
4:40-5:00 p.m. Relevant coding and information bottlenecks: A principled approach to multivariate feature selection - Naftali Tishby - [slides in ppt]
5:00-5:20 p.m. Learning discriminative feature transforms may be an easier problem than feature selection - Kari Torkkola - [slides in pdf]
5:20-5:30 p.m. Pause.
5:30-6:30 p.m. Discussion. Organization of a future workshop with a feature selection algorithm benchmark.
6:30-7:00 p.m. Impromptu talks. People interested in giving last-minute short presentations should let themselves known at the time of the workshop.
BIOwulf Technologies 2030 Addison Street, Suite 102, Berkeley CA 94704 phone: 510-883-7220 fax: 510-883-7223