This book harvests three years of
effort of
hundreds of researchers who have participated to three competitions we
organized around five datasets from various application domains. Three
aspects were explored:
- Data
representation. With the
proper data representation, learning becomes almost trivial. For the
defenders of fully automated data processing, the search for better
data representations is just part of learning. At the other end
of the spectrum, domain specialists engineer data representations,
which are tailored to particular applications. The results of
the "Agnostic Learning
vs. Prior Knowledge" challenge are discussed in the book
and the best papers from the IJCNN 2007
workshop on "Data Representation Discovery" where the best
competitors presented their results are included. Given a
family of models with adjustable parameters, Machine Learning provides
us with means of "learning from examples" and obtaining a good
predictive model. The problem becomes more arduous when the family of
models possesses so-called hyper-parameters or when it consists of
heterogenous entities (e.g. linear models, neural networks,
classification and regression trees, kernel methods, etc.) Both
practical and theoretical considerations may yield to split the problem
into multiple levels of inference. Typically, at the lower level, the
parameters of individual models are optimized and at the second level
the best model is selected, e.g. via cross-validation. This problem is
often referred to as model selection. The results of the "Model
Selection Game"
are included in the book as well as the best papers of the NIPS 2006
"Multi-level
Inference" workshop. In most
real world
situations, it is not sufficient to provide a good predictor, it is
important
to assess accurately how well this predictor will perform on new unseen
data.
Before deploying a model in the field, one must know whether it will
meet
the specifications or whether one should invest more time and resources
to
collect additional data and/or develop more sophisticated models. The
performance
prediction challenge asked participants to provide prediction results
on
new unseen test data AND to predict how good these predictions were
going
to be on a test set for which they did not know the labels ahead of
time.
Therefore, you had to design both a good predictive model and a good
performance
estimator. The results of the "Performance
Prediction Challenge" and the best papers of the "WCCI
2006 workshop of model selection" are included in the book. The best
papers of a special
topic of JMLR on model selection, including longer contributions of
the best challenge participants, are also reprinted in the book. The book is
a valuable resource for students, teachers, researchers and
engineers in machine learning, data mining and statistics. We are also
making available the datasets of the challenge and sample Matlab code.
It is distributed
for free in PDF format and is available at printing cost for
USD 30 from Amazon
and
Barnes and Nobles .
The validation set labels are now available for the agnostic learning track and the prior knowledge track. (they became available to the participants mid-way through the challenge). All datasets are stored in simple text formats. Sample Matlab code is available to read the data and format the results. The results must be uploaded to the challenge web site for result scoring. See the example of result archive. Aknowledgements: We are very thankful to the institutions who originally gave the data. A report describing the datasets and giving credit the data donors is available. Results The results of the competitions are available on-line: - Performance Prediction Challenge results. - Agnostic Learning vs. Prior Knowledge challenge and model selection game: - December 1st, 2006: Results of the model selection game. [Slides]. - March 1st , 2007: Competition deadline ranking. [Slides] The March 1st results prompted us to extend the competition deadline because the participants in the "prior knowledge track" are still making progress, as indicated by the learning curves. Presently the prior knowledge track obtains slightly better results than the agnostic learning track, but the differences are not very significant. - August 1st, 2007: Final ALvsPK challenge results. Links Performance prediction: WCCI
2006 workshop of model selection and performance
prediction challenge. We organized a competition on model selection
and the prediction of generalization performance. How good are you at
predicting how good you are? Model selection: NIPS 2006
workshop on multi-level inference and model selection game. We
organized a game of model selection using the same datasets as the
"Performance prediction challenge" but restricting people to using
models from a provided toolbox. Preprocessing: IJCNN 2007 Data representation discovery workshop and Agnostic learning vs. Prior knowledge challenge. “When everything fails, ask for additional domain knowledge” is the current motto of machine learning. Therefore, assessing the real added value of prior/domain knowledge is a both deep and practical question.The participants competed in two track: the “prior knowledge track” for which they had access to the raw data and information about the data, and the “agnostic learning track” for which they had access to preprocessed data with no knowledge of the identity of the features. Other competitions: ChaLearn keeps organizing new competitions, check them out! .
Coordinator: Co-editors: This material is based upon work
supported by the National Science Foundation under Grant
N0. ECCS-0424142 and Grant
N0. ECCS-0736687. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the authors and
do not necessarily reflect the views of the National Science Foundation. |