NIPS 2001 workshop on feature selection

NIPS 2001 workshop on Variable and Feature Selection

December 6-8, 2001
Delta Whistler Resort, British Columbia, CA

Problem description

Variable selection refers to the problem of selecting input variables that are most predictive of a given outcome. Variable selection problems are found in all machine learning tasks, supervised or unsupervised, classification, regression, time series prediction, two-class or multi-class, posing various levels of challenges. Feature selection refers to the selection of an optimum subset of features derived from these input variables. Thus variable selection is distinct from feature selection if the inducer (classifier, regression machine, etc.) operates in a feature space that is not the input space. However, in the rest of this text we sometimes use interchangeably the terms variable and feature.

In the recent years, variable selection has become the focus of a lot of research in several areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing, particularly in application to Internet documents, and Genomics, particularly gene expression array data. The objective of variable selection is two-fold: improving the prediction performance of the inducers and providing a better understanding of the underlying concept that generated the data.

Challenges

Variable/feature selection problems are related to the problems of input dimensionality reduction and of parameter pruning. All these problems revolve around the capacity control of the inducer and are instances of the model selection problem. However, variable selection has practical and theoretical challenges of its own: First, the definition of the mathematical statement of the problem is not widely agreed upon and may depend on the application. One typically distinguishes:
(i) the problem of discovering all the variables relevant to the concept (and determine how relevant they are and how related to one another) from (ii) the problem of finding a minimum subset of variables (or alternative subsets) that are useful to the inducer (i.e. provide good generalization). But there are many variants of the statement. For some applications, intermediate products such as variable ranking, variable subset ranking, and search trees are particularly important. These intermediate products may be combined with other selection criteria from independent data sources. They may also allow the user to easily explore the tradeoff between inducer performance and feature set compactness. Determining an optimum number of features is then a separate model selection problem. The nomenclature of approaches to the problem is not well established either. Methods assessing the quality of feature subsets according to the prediction error of an inducer are called wrapper methods. Those using criteria such as correlation coefficients that do not involve the inducer are called filter methods. But in reality there is a whole range of methods, including methods that embed feature selection in the learning algorithms. Other distinctions can be made according to whether the feature selection is supervised or unsupervised, whether the inducer is multivariate or univariate.

From the theory point of view, the model selection problem of feature selection is notoriously hard. Even harder is the simultaneous selection of the features and the learning machine, or, in the case of unsupervised learning, the simultaneous selection of the features and the number of clusters. There is experimental evidence that greedy methods work better than combinatorial search but a learning theoretic analysis of the underlying regularization mechanisms remains to be done. Other theoretical challenges include estimating with what confidence one can state that a feature is relevant to the concept when it is useful to the inducer. Finally rating the variable/feature selection methods also poses challenges.

Because of the hot nature of the topic and the large number of open questions, we anticipate that this workshop will be a forum for active discussions. One of the objectives of the workshop will be to discuss the modalities of a competition on variable selection to be possibly organized for NIPS 2002. We expect that the process of designing the competition will allow us to clarify the mathematical statement of the problem and methods for rating variable/feature selection methods.

Workshop format

Prospective participants are invited to report results of variable selection on data sets suggested and reflect upon the problem of designing a good competition for the problem of variable selection. Code to generate the artificial data set will also be made available for review and suggestions.

For example, a possible competition format for next year would be the following: Datasets would be posted several months before the workshop. The participants would be given a few weeks to run these data sets through their variable selection algorithm. They would then return their selection of variables, in exchange of which they would be given the test set, for these variables only. They would have to return the predicted outputs on the test data. The result submissions would be compared on the basis of three criteria: compactness of the feature set, test performance, and number of submissions per individual/organization.

Workshop audience

The workshop is open to anybody who has an interest in variable/feature selection and there are no pre-requites to attend. Familiarity with the basics of the problem can be found in the tutorial "Wrappers for Feature Selection" by Ron Kohavi and George John http://citeseer.nj.nec.com/13663.html.

Submission

The workshop is over. Prospective participants were invited to submit one or two pages of summary and report experiments preferably on the proposed data sets. Some slides and papers are available in the schedule. The proceedings will be published as a special issue of the Journal of Machine Learning Research (see the call for paper).

Data sets

Several real world data sets can be downloaded from the Stanford Microarray database
http://genome-www4.stanford.edu/MicroArray/SMD/publications.html
In particular:
Perou, C. M., et al. (2000) Molecular portraits of human breast tumours. Nature 406:747-752
Ross, D., et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24:227-235.
Alizadeh, A.A., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503-511.

Preliminary artificial data generation program for a linear 2 class classification problem (Isabelle Guyon):
5 kb zip file with Matlab code

A problem of linear regression by Leo Breiman found in:
Breiman, L., Heuristics of instability and stabilization in model
selection, The Annals of Statistics, 24(6), pp 2350--2383, 1996.
provided by Yves Grandvalet.
3 kb Matlab file

The noisy LED display problem
A problem studied by Yves Grandvalet in his presentation.
3 kb Matlab file

Schedule

Friday, December 7th:
7:30-10:30 a.m. and 4:00-7:00 p.m.

7:30-8:00: Welcome and introduction to the problem of feature/variable selection - Isabelle Guyon - [ppt slides]

8:00-8:20 a.m. Dimensionality Reduction via Sparse Support Vector Machines - Jinbo Bi, Kristin P. Bennett, Mark Embrechts and, Curt Breneman - [Gunzipped ppt slides]

8:20-8:40 a.m. Feature selection for non-linear SVMs using a gradient descent algorithm - Olivier Chapelle and Jason Weston -

8:40-9:00 a.m. When Rather Than Whether: Developmental Variable Selection - Melissa Dominguez - [slides in ppt]

9:00-9:20 a.m. Pause, free discussions.

9:20-9:40 How to recycle your SVM code to do feature selection - Andre Elisseeff and Jason Weston -

9:40-10:00 Lasso-type estimators for variable selection - Yves Grandvalet and Stéphane Canu - [slides in ps]

10:00-10:30 a.m. Discussion. What are the various statements of the variable selection problem?

4:00-4:20 p.m. Using DRCA to see the effects of variable combinations on classifiers - Ofer Melnik - [paper in ps or pdf]

4:20-4:40 p.m.Feature selection in the setting of many irrelevant features - Andrew Y. Ng and Michael I. Jordan -

4:40-5:00 p.m. Relevant coding and information bottlenecks: A principled approach to multivariate feature selection - Naftali Tishby - [slides in ppt]

5:00-5:20 p.m. Learning discriminative feature transforms may be an easier problem than feature selection - Kari Torkkola - [slides in pdf]

5:20-5:30 p.m. Pause.

5:30-6:30 p.m. Discussion. Organization of a future workshop with a feature selection algorithm benchmark.

6:30-7:00 p.m. Impromptu talks. People interested in giving last-minute short presentations should let themselves known at the time of the workshop.

Affiliation with other workshops

NIPS 2000
Cross-Validation, Bootstrap, and Model Selection (variable/feature selection is a particular case of model selection)
New Perspectives in Kernel-Based Learning Methods (a lot of recent variable selection methods use kernel methods, including SVMs)
Using Unlabeled Data for Supervised Learning (competition organised)

CAMDA 2001
Critical Assessment of Microarray Data (competition on microarray data).

Workshop chair

Isabelle Guyon
BIOwulf Technologies
2030 Addison Street,
suite 102
Berkeley, CA 94704
U.S.A.
Tel: (510) 883 7220
Fax: (510) 8837223
isabelle@clopinet.com

BIOwulf Technologies 2030 Addison Street, Suite 102, Berkeley CA 94704 phone: 510-883-7220 fax: 510-883-7223