The Westin Resort and
Spa and Westin Hilton
Whistler, B.C.,
December 9, 2006
** Game results
available! **
Given a family of models with adjustable parameters, Machine Learning provides us with means of "learning from examples" and obtaining a good predictive model. The problem becomes more arduous when the family of models possesses so-called hyper-parameters or when it consists of heterogenous entities (e.g. linear models, neural networks, classification and regression trees, kernel methods, etc.) Both practical and theoretical considerations may yield to split the problem into multiple levels of inference. Typically, at the lower level, the parameters of individual models are optimized and at the second level the best model is selected, e.g. via cross-validation.
In a recent workshop on model selection, where we discussed the results of the performance prediction challenge, we observed that many theoretically motivated methods have been proposed, but the simple 10-fold cross-validation method seems to give the best results. Yet everyone agrees that this method is very suboptimal.
This workshop will revisit the problem of model selection, with the goal of bridging the gap between theory and practice. The topics of interest include:
Game
Part of the workshop will be devoted
to the results of a model selection game. The participants are provided
with a machine learning toolbox based on the Matlab toolkit "the Spider".
The toolkit provides a flexible way of building models by combining preprocessing,
feature selection, classification and postprocessing modules. Ensembles of
classifiers can also be built. The goal of the game is to build the best compound
model. In this constrained framework, the participants are encouraged to
focus on model selection, not on the development of new algorithms.
How to participate:
The game was open between October 1, and December
1, 2006.
It is now over, but you can still participate in the IJCNN 2007 Agnostic Learning vs. Prior Kowledge
challenge, which is a continuation of the game!
The game has the following additional specific set of rules (in addition to those of the challenge):
Results
Of the two prizes we had announced
(best CLOP model and best non-CLOP Spider model), only one was attributed
because there were no non-CLOP Spider models entered. Three entrants used
CLOP models (shown in yellow in the table). The winner of the $500 award is
Juha Reunanen.
|
|
GINA |
HIVA |
NOVA |
SYLVA |
Ave. rk |
Ave. best |
Date |
Rev-submit |
Roman Lutz |
1 |
1 |
5 |
1 |
4 |
2.4 |
LogitBoost_with_trees |
10/10/06 |
1 |
Juha Reunanen |
5 |
2 |
1 |
2 |
6 |
3.2 |
cross-indexing-7 |
12/1/06 |
1 |
H. Jair Escalante |
7 |
3 |
2 |
3 |
7 |
4.4 |
BRun2311062 |
11/23/06 |
5 |
J. Wichard |
3 |
5 |
4 |
8 |
2 |
4.4 |
mixed_tree_ensembles |
10/27/06 |
3 |
VladN |
6 |
4 |
3 |
5 |
5 |
4.6 |
RS1 |
10/9/06 |
3 |
Marc Boulle |
2 |
7 |
7 |
6 |
1 |
4.6 |
SNB(CMA)_+_100k_F(2D)_t |
11/21/06 |
1 |
The Machine |
4 |
6 |
6 |
9 |
3 |
5.6 |
TMK |
11/14/06 |
5 |
weseeare |
8 |
8 |
8 |
4 |
8 |
7.2 |
YAT |
11/25/06 |
1 |
pipibjc |
9 |
9 |
9 |
7 |
9 |
8.6 |
naiveBayes_Ensemble |
11/10/06 |
1 |
In the table, we show the rank of
every entrant for each of the five datasets on December 1st, 2006. The entrant
rank corresponds to his best entry among his last 5 entries. The entrants
are sorted by average rank. The last column shows the reverse order number
of the submissions of each entrant (i.e. 1 means last entry, 2 second last
entry, etc.) The test set performances
will not be reveaed until March 1st, 2006.
Workshop
participation
Participation in the workshop is not
conditioned to entering the game. Likewise, game entrants are not required
to attend the workshop nor to publish the methods they employed. Game entrants
may remain anonymous during the development period, but only identified entrants
will be included in the final competition ranking.
To make a presentation at the workshop, please contact
the workshop chair with your proposal. Proposals will be selected on the
basis of:
Deadline November 15, 2007 (OVER!)
The best contributions will be invited to submit a paper to a special topic of the Journal of Machine Learning Research. Participants are also encouraged to submit negative results to the Journal of Interesting Negative Results.
Morning session 7:30am-10:30am
Model selection
game
Chair: Gavin
Cawley
7:30am-8:00am - Benchmark datasets and game result summary
[Slides]
Isabelle Guyon, Amir Saffari, Gideon
Dror, Gavin Cawley, Olivier
Guyon
8:00am-8:30am - Implementation of baseline methods. [Slides] Gavin Cawley
8:30am-9:15am - Results on the Model Selection Game: Towards
a Particle Swarm Model Selection Algorithm
; [Slides]
H. Jair Escalante
9:15am Break
9:30-10:00 pm Model selection
for Gaussian Processes ; [Slides]
Chris Williams
See also: chapter 5 in Gaussian Processes
for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams
MIT Press, 2006
http://www.gaussianprocess.org/gpml/
10:00pm-10:30pm Stability of bagged decision
trees ; [Slides] ; [Paper]
Yves Granvalet
@incollection{Grandvalet06b,
Author = {Grandvalet, Y.},
Title = {Stability of Bagged Decision Trees},
Booktitle = {Proceedings of the XLIII Scientific Meeting of the Italian
Statistical Society},
Pages = {221--230},
Publisher = {CLEUP},
Year = {2006}
}
Afternoon session 3:30am-6:30am
Multi-level
inference
Chair: Isabelle
Guyon
3:30-4:15pm Tutorial on mathematical programming
for multi-level optimization ; [Slides]
Kristin Bennett
4:15-4:45 pm Convex optimization
approaches for model selection ; [Slides]
Kristiaan Pelckmans and Johan Suykens
See also:
- Pelckmans K., Suykens J.A.K., De Moor B., “ A convex Approach to Validation-based
Learning of the Regularization Constant'', Accepted for publication in IEEE
Transactions on Neural Networks
- Pelckmans K., Primal-Dual Kernel Machines, Ph.D. thesis, Faculty of
Engineering, K.U.Leuven (Leuven, Belgium), May 2005, 280 pp.
- http://homes.esat.kuleuven.be/~kpelckma/research/
4:45pm Break
5:00-5:30 pm Bayesian
regularization in model selection ; [Slides]
Gavin Cawley
5:30-6:00 pm On Model Selection
in Clustering ; [Paper]
[Slides]
Volker Roth and Tilman Lange
See also:
Stability-Based
Validation of Clustering Solutions
http://neco.mitpress.org/content/vol16/issue6/.
Tilman Lange, Volker Roth, Mikio L. Braun and Joachim M. Buhmann,
Neural Computation, 16(6):1299 -- 1323, 2004.
6:00pm-6:30pm Debate. Impromptu talks / right of answer.
WCCI
performance prediction challenge. How good are you at predicting how
good you are? 145 participants tried to answer that question. Cross-validation
came very strong. Can you do better? Measure yourself against the winners
by participating to the model selection game.
NIPS 2003 workshop on feature extraction and feature selection challenge. We organized a competition on five data sets in which hundreds of entries were made. The web site of the challenge is still available for post challenge submissions. Measure yourself against the winners! See the book we published with a CD containing the datasets, tutorials, papers on s.o.a. methods.
Pascal challenges: The Pascal network is sponsoring several challenges in Machine learning.
Data mining competitions:
A list of data mining competitions maintained
by KDnuggets, including the well known KDD cup.
List
of data sets for machine learning:
A rather comprehensive list maintained
by MLnet.
UCI machine learning repository: A great collection of datasets for machine learning research.
DELVE: A platform developed
at
CAMDA
Critical Assessment of Microarray Data
Analysis, an annual conference on gene expression microarray data analysis.
This conference includes a context with emphasis on gene selection, a special
case of feature selection.
ICDAR
International Conference on Document
Analysis and Recognition, a bi-annual conference proposing a contest in
printed text recognition. Feature extraction/selection is a key component
to win such a contest.
TREC
Text Retrieval conference, organized
every year by NIST. The conference
is organized around the result of a competition. Past winners have had to
address feature extraction/selection effectively.
ICPR
In conjunction with the International
Conference on Pattern Recognition, ICPR 2004, a face recognition contest
is being organized.
CASP
An important competition in protein structure
prediction called Critical Assessment of
Techniques for Protein Structure
Prediction.
Workshop chair:
Isabelle Guyon
Clopinet Enterprises
955,
Tel/Fax: (510) 524 6211
Collaborators and advisors: Amir Reza Saffari Azar (Graz University of Technology), Gökhan BakIr (MPI for Biological Cybernetics), Kristin Bennett (Rensselaer Polytechnic Institute), Gavin Cawley (University of East Anglia), Gideon Dror (Academic College of Tel-Aviv-Yaffo), Olivier Guyon (MisterP services), Joachim Buhmann (ETH, Zurich), and Lambert Schomaker (University of Groningen).