Feature Extraction, Foundations and Applications
This book is organized around the results of a benchmark that took
place in 2003 (the website
of the challenge is still active) and whose results were discussed
at the NIPS 2003
workshop on feature extraction. Dozens of research groups competed
on five large feature selection problems from various application domains:
medical diagnosis, text processing, drug discovery, and handwriting recognition.
The results of this effort pave the way to a new generation of methods
capable of analyzing data tables with million of lines and/or columns.
This book is a step towards validating, unifying, and formalizing approaches.
The data and sample code with useful baseline methods implemented in Matlab
are available as a single
big archive (93 MB). Course material is
also available.
FREE BOOK INTRODUCTION
(courtesy of the publisher).
"This book compiles some very promising techniques in feature selection and supervised learning, coming from an extremely smart collection of researchers, delivering their best ideas in a competitive environment." -- Trevor Hastie, Stanford University
|
"Feature Selection is a key technology for making sense of the high dimensional data which surrounds us. Isabelle Guyon et al. have done a splendid job in designing a challenging competition, and collecting the lessons learned." -- Bernhard Schoelkopf, Max Planck Institute for Biological Cybernetics |
|
Feature Extraction, one of the pillars of machine learning, finds application in biotechnology, industrial inspection, the Internet, radar, sonar, and speech recognition, just to name a few. This new book, edited by Isabelle Guyon, Steve Gunn, Masoud Nikravek and Lotfi Zadeh, is unique: - The book owes it origin to a competition, followed by a Neural Information Processing Systems (NIPS) Workshop that was held in December 2003. - Most, important, the book embodies many of the-state-of-the-art methods in feature extraction. Simply put, the book will make a difference to the literature on machine learning. -- Simon Haykin, Mc Master University |
Foreword by Peter Norvig
Everyone loves a good competition. As I write this, two billion fans are
eagerly anticipating the 2006 World Cup. Meanwhile, a fan base that is
somewhat smaller (but presumably includes you, dear reader) is equally eager
to read all about the results of the NIPS 2003 Feature Selection Challenge,
contained herein. Fans of Radford Neal and Jianguo Zhang (or of Bayesian
neural networks and Dirichlet diffusion trees) are gloating ``I told you
so'' and looking for proof that their win was not a fluke. But the matter
is by no means settled, and fans of SVMs are shouting "wait 'til next year!"
You know this book is a bit more edgy than your standard academic treatise
as soon as you see the dedication: ``To our friends and foes.''
Competition breeds improvement. 50 years ago, the champion in 100m
butterfly swimming was 22% slower than today's champion; the women's marathon
champion from just 30 years ago was 26% slower. Who knows how much
better our machine learning algorithms would be today if Turing in 1950
had proposed an effective competition rather than his elusive Test?
But what makes an effective competition? The field of Speech Recognition
has had NIST-run competitions since 1988; error rates have been reduced
by a factor of three or more, but the field has not yet had the impact expected
of it. Information Retrieval has had its TREC competition since 1992;
progress has been steady and refugees from the competition have played
important roles in the hundred-billion-dollar search industry. Robotics
has had the DARPA Grand Challenge for only two years, but in that time we
have seen the results go from complete failure to resounding success (although
it may have helped that the
second year's course was somewhat easier than the first's).
I think there are four criteria that define effective technical competitions:
The Feature Selection Challenge meets the first three criteria easily.
75 teams entered, so they must have found it approachable. The scoring
did a good job of separating the top performers while keeping everyone on
the scale. And the results are all available online, in this book,
and in the accompanying CD. All the data and Matlab code is provided,
so the Challenge is easily reproducible. The level of explication provided
by the entrants in the chapters of this book is higher than in other similar
competitions. The fourth criterion, real-world relevance, is perhaps
the hardest to achieve. Only time will tell whether the Feature Selection
Challenge meets this one. In the mean time, this book sets a high standard
as the public record of an interesting and effective competition.
-Peter Norvig
Palo Alto, CA
Table of Contents
An Introduction
to Feature Extraction
Isabelle Guyon, André
Elisseeff
Feature extraction includes feature construction, space dimensionality
reduction, sparse representations, and feature selection. All these techniques
are commonly used as preprocessing to machine learning and statistics
tasks of prediction, including pattern recognition and regression. Although
such problems have been tackled by researchers for many years, there has
been recently a renewed interest in feature extraction. A number of new
applications with very large input spaces critically need space dimensionality
reduction for efficiency and efficacy of the predictors. These applications
include bioinformatics (DNA microarrays, mass-spectrometric data, etc.),
combinatorial chemistry (e.g. high throughput screening of drug candidates),
text processing (e.g. spam filtering), decision making (e.g. oil drilling),
pattern recognition (e.g. handwriting recognition), speech processing,
and vision.
An earlier
tutorial by the same authors published in JMLR.
Part I: Feature Extraction Fundamentals
The first part of the book consists of tutorial chapters that synthesize
recent approaches and new theoretical advances.
1) Learning machines
Krzysztof Grabczewski & Norbert Jankowski
Brief tutorial on some of the most widely used learning machines (Linear
discriminant, naïve Bayes, neural networks, fuzzy logic, kernel methods
-- including SVMs --, decision trees, clustering.) The goal of this chapter
is to allow the reader to read the other chapters without referring to
other textbooks.
2) Assessment methods
Gerard Dreyfus
Fundamentals in statistical testing and cross-validation. The goal
of this chapter is to provide the reader the necessary tools to assess the
goodness of selected features in a statistically significant way.
3) Filter methods
Wlodzislaw Duch
Correlation methods and other methods for ranking feature in order
of relevance. The goal of this chapter is to provide the reader with easy
to implement methods, which are also important baseline methods.
4) Search strategies
Juha Reunanen
This chapter covers techniques that can be combined with any off-the-shelf
learning machine package in a "wrapper" or "filter" feature selection
setting, including exhaustive search (and computational and statistical
limitations), classical greedy search (forward, backward), best first,
branch and bound, beam search, GA, simulated annealing.
5) Embedded methods
Navin Lal, Olivier
Chapelle, Jason Weston, Andre Elisseeff
This chapter covers algorithms that select features in the process
of learning, including gradient based feature scaling; nested subset methods;
use of performance bounds; direct objective optimization; decision trees.
6) Information theoretic methods
Kari Torkkola
Methods derived from information theoretic principles both for feature
selection and construction, including information bottleneck, and Markov
blankets.
7) Ensemble learning
Eugene Tuv
Methods involving voting over several feature subsets. Bayesian approaches
to feature selection, bagging/stability methods.
8) Fuzzy Neural Networks
Madan M. Gupta, Noriyasu
Homma, Zeng-Guang Hou
A new generation of learning machines combining Fuzzy Logic and neural
networks.
Part II: Feature Selection Challenge
The second part of the book includes the results of the NIPS 2003
feature selection challenge. See the slides of
the NIPS 2003 workshop.
9) Design and Analysis of the NIPS2003 Challenge
Isabelle Guyon, Steve Gunn,
Asa Ben Hur, Gideon Dror
The challenge attracted 75 competitors. Researchers all over the world
competed for 12 weeks on 5 sizeable datasets from a variety of applications,
with number of input features ranking from 500 to 100,000. The goal was
to use as few input features as possible without sacrificing prediction performance.
The competitors succeeded in reducing the input space dimensionality by
orders of magnitude while retaining or improving performance. Some of the
most efficient methods are among the simplest and fastest. The benefits of
using such techniques include performance improvements in terms of prediction
speed and accuracy as well as gaining better data understanding by identifying
the inputs that are most predictive of a given outcome. A short
version was published in the NIPS proceedings.
10) High Dimensional Classification with Bayesian Neural Networks
and Dirichlet Diffusion Trees
Radford M. Neal, Jianguo Zhang
11) Ensembles of Regularized Least Squares Classifiers for High-Dimensional
Problems
Kari Torkkola and Eugene Tuv
12) Combining SVMs with Various Feature Selection Strategies
Yi-Wei Chen, Chih-Jen Lin
13) Feature Selection with Transductive Support Vector Machines
Zhili Wu, Chunhung Li
14) Variable Selection using Correlation and Single Variable Classifier
Methods: Applications
Amir Reza Saffari Azar Alamdari
15) Tree-Based Ensembles with Dynamic Soft Feature Selection
Alexander Borisov, Victor Eruhimov, Eugene Tuv
16) Sparse, Flexible and Efficient Modeling using L1 Regularization
Saharon Rosset, Ji Zhu
17) Margin Based Feature Selection and Infogain with Standard Classifiers
Ran Gilad-Bachrach, Amir Navot
18) Bayesian Support Vector Machines for Feature Ranking and Selection
Wei Chu, S. Sathiya Keerthi, Chong Jin Ong, Zoubin Ghahramani
19) Nonlinear Feature Selection with the Potential Support Vector
Machine
Sepp Hochreiter, Klaus Obermayer
20) Combining a Filter Method with SVMs
Thomas Navin Lal, Olivier Chapelle, Bernhard Schölkopf
21) Feature Selection via Sensitivity Analysis with Direct Kernel
PLS
Mark J. Embrechts, Robert A. Bress, Robert H. Kewley
22) Information Gain, Correlation and Support Vector Machines
Danny Roobaert, Grigoris Karakoulas, Nitesh V. Chawla
23) Mining for Complex Models Comprising Feature Selection and
Classification
Krzysztof Grabczewski, Norbert Jankowski
24) Combining Information-Based Supervised and Unsupervised Feature
Selection
Sang-Kyun Lee, Seung-Joon Yi, Byoung-Tak Zhang
25) An Enhanced Selective Naïve Bayes Method with Optimal
Discretization
Marc Boullé
26) An Input Variable Importance Definition based on Empirical
Data Probability Distribution
V. Lemaire, F. Clérot
Part III New Perspectives in Feature Extraction
The last part of the book is devoted to recent advances in feature
extraction.
27) Spectral Dimensionality Reduction
Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-Francois Paiement,
Pascal Vincent, Marie Ouimet
28) Constructing Orthogonal Latent Features for Arbitrary Loss
Michinari Momma, Kristin P. Bennett
29) Large Margin Principles for Feature Selection
Ran Gilad-Bachrach, Amir Navot, Naftali Tishby
30) Feature Extraction for Classification of Proteomic Mass Spectra:
A Comparative Study
Ilya Levner, Vadim Bulitko, Guohui Lin
31) Sequence motifs: highly predictive features of protein function
Asa Ben-Hur, Douglas Brutlag
Appendices
A. Elementary Statistics
B. Feature Selection Challenge Datasets
C. Feature Selection Challenge Fact Sheets
D. Feature Selection Challenge Results Tables