Feature Selection 

Pascal bootcamp, Vilanova i la Geltrú, Spain

July 2-6, 2007


Falcons

This course covers feature selection fundamentals and applications. The students will first be reminded of the basics of machine learning algorithms and the problem of overfitting avoidance. In the wrapper setting, feature selection will be introduced as a special case of the model selection problem. Methods to derive principled feature selection algorithms will be reviewed as well as heuristic method, which work well in practice. One class will be devoted to feature construction techniques. Finally, a lecture will be devoted to the connections between feature section and causal discovery. The class will be accompanied by several lab sessions. The course will be attractive to students who like playing with data and want to learn practical data analysis techniques. The instructor has ten years of experience with consulting for startup companies in the US in pattern recognition and machine learning. Datasets from a variety of application domains will be made available: handwriting recognition, medical diagnosis, drug discovery, text classification, ecology, marketing.

         The classes are taking place on the Vilanova campus of Universitat Politècnica de Catalunya (UPC), July 2-6, 2007. The event is planned by Jose Luis Balcázar.

       CLOP package Installation
       If you bring your own laptop, download CLOP package.
       
Unzip the archive and follow the instructions in the README file.
       Windows users will just have to run a script to set the Matlab path properly to use most functions.
       Unix users will have to compile the LibSVM package if they want to use support vector machines. Please use the latest Makefile: Makefile_amir.

       See our support page for additional information. Note that there is another version of CLOP presently used for the ALvsPK challenge. We ask you not to use it.
·    

       Feature extraction book:
       The class is based on a book, which compiles the results of the NIPS 2003 feature selection challenge and includes tutorial chapters.
       Download the feature extraction book introduction. Copies of the full book may be purchased.

       Suggested readings:

·    Structural risk minimization for character recognition
·    Kernel Ridge Regression tutorial
·    Linear discriminant and support vector classifiers
·    Causal feature selection

Schedule

  Slides Date
(July 2007)
Time
Lecture Exercise class

1

Introduction
Monday, 2 Lecture: 9:30-10:30
Lab group 1: 15:30-16:45
Lab group 2: 17:00-18:15

Introduction to Machine Learning
Basic learning machine. Principle of learning.
Introduction to the spider: loading data, training and testing a simple model (eg. Naïve Bayes). Use of a toy dataset. Description of the CLOP library.
2 Overfitting
Wednesday 4
Lecture 1: 9:00-9:50
Lecture 2: 10:00-10:50
Lecture 3: 11:30-12:20
Lab group 1: 15:30-16:45
Lab group 2: 17:00-18:15
Learning without overlearning
Overfitting avoidance, performance prediction, cross-validation


Play with the Dexter and Madelon datasets of the feature selection challenge. Apply naïve Bayes, ridge regression, SVM. Add filters.
3
Feature selection 1
Wednesday 4
Introduction to feature selection
Filters, wrappers, and embedded methods
4
Feature selection 2
Wednesday 4
Embedded methods of feature selection
Learning theory put to work to build feature selection algorithms
5
Feature construction
Thursday, 5
Lecture 1: 9:00-9:50
Lecture 2: 10:00-10:50
Lab group 1: 15:30-16:45
Lab group 2: 17:00-18:15

Feature construction
How to build better features with simple methods, convolutions, PCA, etc.

Play with the Gisette dataset of the feature selection challenge. See how with simple feature extraction methods, performances can be improved over the pure “agnostic” approach.

6


Causality

Thursday, 5
Causality and feature selection
Limitations of methods of feature selection ignoring the data selection process.

7




Friday, 6
Panel:
11:30-13:20
Lab group 1: 15:30-16:45
Lab group 2: 17:00-18:15

No lecture
Install the latest CLOP version of the last challenge with R and Weka extensions.
Choose any dataset from the feature selection challenge or the AL vs PK challenge. Play to match or  outperform the best results (see ETH student results and NIPS model selection game).

Pascal          Clopinet