|
|
Domain |
Name and
Description |
Size/type of
data |
|
1 |
Chemo-informatics |
CHEMO:
Library of small molecules coded with QSAR features. The task is to
predict molecule toxicity. |
51440
examples available as molecule formulas or in feature rep. (851 feat.).
2 class classif. or regression |
|
2 |
Handwriting
recognition |
AVICENA: The
task is to spot Arabic words in an ancient manuscript to facilitate
indexing. |
35070
examples available as raw images or in feature rep. (92 features). 15
classes |
|
3 |
Object
recognition from still images |
IMAR: The
task is to label the image with the most prominent object(s) for
indexing purpose. |
Possibly use
data from Caltech 256. Images could be preprocessed in a way that makes
them difficult to identify. 30608 pictures, 2567 classes. |
|
4 |
Situation
recognition from video clips |
SITUAR: The
task is to recognize the occurrence of a given situation (pose or short
action) in a short video clip, such as someone giving a phone call,
someone getting out of a car, an animal crossing a road. |
Possibly use
KTH (2391 sequences, from 600 videos= 25 subject x 6 actions x 4
scenarios) or Hollywood dataset (3669 video clips from 69 movies ~150
examples per class. 12 classes of actions). |
|
5 |
Speech
recognition |
SPEAK: The
task is word spotting: find the presence of a given word in a spoken
sequence. |
Possibly use
the TIMIT database. Licensing issues? |
|
6 |
Socio-economic
data |
SOCIO: The
task will be to predict revenue using census data. |
Publicly
available census data, millions of entries available. |
|
7 |
Text
processing |
PROTEXT:
Classifying or ranking text based on queries. |
A large
publicly available dataset, possibly OpenTable. |
|
8 |
Ecology data |
SYLVESTER:
Classification of forest cover. |
72626
examples coded by 12 real features and tons of distractors. 2
classes. |
August 31, 2010: : Deadline for NIPS 2010
demo proposals.
September 20, 2010: Deadline for
NIPS 2010 demo proposals.
December 6-9, 2010: NIPS 2010
conference, Vancouver, Canada.
December-January 2010: First
competition starts.
June 11-14, 2011: ICML 2011,
Seattle, WA.
July 31-August 5, 2011: IJCNN 2011,
San Jose, CA
Links to related workshops/competitions
WCCI 2010
special seesion on active and autonomous learning. Discussion of
the results of the active learning challenge.
AISTATS 2010
workshop on active learning and experimental design. Tutorial on
experimental design by Donald Rubin. Papers presenting the results of
the active learning challenge.
Active
learning challenge: Using for the first time the virtual lab of the
causality workbench, the participants could buy labels for virtual cash
and monitor the tradeoff between getting good classification accuracy
and spending a lot on getting labels.
NIPS
2009 causality and time series mini-symposium. Featuring a
memorial lecture of Clive Granger by Halbert White.
NIPS 2008 causality workshop: objectives and assessment. The second challenge in causality organized by the causality workbench.
WCCI 2008
causation
and prediction challenge. A first activity of the causality
workbench.
NIPS 2006 workshop on
causality
and feature selection. The ancestor of this workshop.
IJCNN 2007 Agnostic learning
vs.
Prior knowledge challenge. “When everything
fails, ask for
additional domain knowledge” is the current motto of machine learning.
Therefore,
assessing the real added value of prior/domain knowledge is a both deep
and
practical question.The participants competed in two track: the “prior
knowledge
track” for which they had access to the raw data and information about
the
data, and the “agnostic learning track” for which they had access to
preprocessed
data with no knowledge of the identity of the features.
WCCI 2006 performance prediction challenge. “How
good
are you at predicting how good you are? 145 participants tried to
answer
that question. Cross-validation came very strong. Can you do better?
Measure
yourself against the winners by participating to the model selection
game.
NIPS
2003 workshop on feature extraction and feature selection challenge.
We organized a competition on five data sets in which hundreds of
entries
were made. The web site of the challenge is still available for post
challenge
submissions. Measure yourself against the winners! See the book we published with a CD containing the
datasets,
tutorials, papers on s.o.a. methods.
Pascal
challenges: The Pascal network is sponsoring several challenges in
Machine
learning.
Data mining competitions:
A list of data mining competitions maintained by KDnuggets, including
the
well known KDD cup.
List
of data sets for machine learning:
A rather comprehensive list maintained by MLnet.
UCI machine
learning
repository: A great collection of datasets for machine learning
research.
DELVE: A platform
developed
at
CAMDA
Critical Assessment of Microarray Data Analysis, an annual conference
on
gene expression microarray data analysis. This conference includes a
context
with emphasis on gene selection, a special case of feature selection.
ICDAR
International Conference on Document Analysis and Recognition, a
bi-annual
conference proposing a contest in printed text recognition. Feature
extraction/selection
is a key component to win such a contest.
TREC
Text Retrieval conference, organized every year by NIST. The conference
is organized around the result of a competition. Past winners have had
to
address feature extraction/selection effectively.
ICPR
In conjunction with the International Conference on Pattern
Recognition,
ICPR 2004, a face recognition contest is being organized.
CASP
An important competition in protein structure prediction called
Critical
Assessment of
Techniques for Protein Structure Prediction.
deeplearning@ clopinet . com.
US Naval Research Labs