December 13 and 14, 2002
Delta Whistler Resort, British Columbia, CA
Challenge
In mathematics and theoretical computer
science, exhibiting counter examples is part of the established scientific
method to rule out wrong hypotheses. Yet, negative results and counter examples
are seldom reported in experimental papers, although they can be very valuable.
Our workshop will be a forum to freely discuss negative results and introduce
the community to challenging open problems. This may include reporting:
- experimental results of principled algorithms
that obtain poor performance compared to seemingly dumb heuristics;
- experimental results that falsify an
existing theory;
- counter examples to a generally admitted
conjecture;
- failure to find a solution to a given
problem after various attempts;
- failure to demonstrate the advantage
of a given method after various attempts.
Submission
Prospective participants are invited
to submit one or two pages of summary. Theory, algorithm, and application
contributions are welcome. We also welcome tutorials or historical presentations
on negative results and counter examples that pushed the frontiers of neural
network and machine learning research as well as tutorials on scientific methodology
making use of negative results and counter examples.
In preparing your submission, please
remember that reporting negative results and counter examples does not mean
reporting inconclusive results. One may report experiments that failed
because of an invalid design or an invalid theory, if a tentative analysis
of the reasons of failure is provided and the subject matter is potentially
of interest to others. But, the failure of an experiment will be consired
a potentially interesting negative result only if some conclusions can be
drawn.
If you are introducing the community
to a new open problem, it is desirable that you provide both (i) a high level
introduction stating the context of the problem and its fundamental and/or
practical importance, and (ii) a formal mathematical statement of the problem,
if applicable.
Email submissions to: isabelle@clopinet.com
Schedule
Saturday morning session
7:30 am Welcome and introduction, Isabelle Guyon.
7:45 am On the impossibility of learning a continuous
distribution in a covariant way, Timothy Holy and Ilya Nemenman.
We discuss a question of whether it
is possible to infer a continuous probability density and other quantities
in a reparameterization covariant way. An explicitly constructed reparameterization
example gives a negative answer to this question. The conclusion does not
dependent on a particular learning scenario used. We present arguments that
explain and further strengthen the result. Finally we argue that approximate
reparameterization invariance with respect to a class of "weak" reparameterizations
is possible, and the quality of the approximation depends on the number of
samples, as well as on the assumptions about probability densities involved.
8:15 am There
is no unbiased estimator of the variance of K-fold cross-validation, Yoshua Bengio.
K-fold cross validation is probably
the most commonly used method to estimate generalization error (or to perform
model selection), especially when there is little training data. In order
to compare learning algorithms, it is important to estimate the uncertainty
around the estimation obtained by cross-validation. Such uncertainty estimates
(either the variance, a confidence interval, or a p-value against the null
hypothesis of no difference) are made more and more mandatory by reviewers
of machine learning papers involving experimental validation of new learning
algorithms. Unfortunately, we can prove the following negative result (and
we will explain its basis): there exists no universally (for all distributions)
unbiased estimator of the variance of the K-fold cross-validation generalization
error estimator, using only the outcome of the K-fold cross-validation experiment
(i.e. the individual errors). However, understanding the source of this problem
can hopefully help us choose among a variety of candidates, or resort to
estimators based on multiple K-fold cross-validations.
8:45 am On the number
of modes of a Gaussian mixture, Miguel A. Carreira-Perpinan and Chris Williams.
We consider the following question:
given a Gaussian mixture in D dimensions with M components, what is the maximum
number of modes that it can have? As far as we know, the answer to this is
only known for particular types of mixture and/or particular values of D
and M. The question remains open in general. We conjecture that if all the
covariances of the Gaussian mixture are isotropic or equal to each other
then it can have M modes at most. This intuitive conjecture does not hold
when the covariances are non-isotropic. We will review some related results
in statistics and scale space theory and also discuss algorithms that attempt
to find all modes of a Gaussian mixture. Aside from its theoretical relevance,
the problem is practically important for statistical machine learning, in
models such as kernel density estimation or the generative topographic mapping,
and in algorithms such as a recent method for sequential data reconstruction
or mean-shift algorithms for clustering.
9:15 am Break.
9:30 am Discussion:
Negative results.
Informally share negative results. Get
feed-back. Know what you do not need to try.
Saturday afternoon session
4:00 pm Panel:
Suggested Directions of Research.Y. Bengio, N. Tishby and other panelists TBA.
5:00 pm Discussion: Open problems.
Informally share open problems. Have
they been solved them already? Get new ideas.
6:00 pm Break.
6:10 pm Impromptu
talks.
Please contact I. Guyon in the morning
if you want to present in that Section.
6:50 pm Closing remarks.
Open problems and suggested directions
of research will be summarized.
Links
Journal
of Interesting negative results in Natural Language Processing and Machine
Learning. JINR is an electronic
journal that gives a voice to negative results which stem from intuitive
and justifiable ideas, proven wrong through thorough and well-conducted experiments.
It also encourages the submission of short papers/communications presenting
counter-examples to usually accepted conjectures or to published papers.
Forum for Negative Results
in the Journal of Universal Computer Science
Current Computer Science research is
primarily focused on solving engineering problems. Often though, promising
attempts for solving a particular problem fail for non avoidable reasons.
Due to the current CS publication climate such negative results today are
usually camouflaged as positive results by non evaluating or mis-evaluating
the research or by redefining the problem to fit the solution.
Science
Makes Much Ado About Nothing
After decades of shelving studies with negative
results, researchers around the nation are agog about not one, but two new
journals that focus only on studies that demonstrate what doesn't work.
Journal
of Negative Observations in Genetic Oncology
In the pursuit of genes whose mutations drive
the development of human cancers, most of the candidate genes will elicit
negative results -- i.e., no mutations will be found. The dissemination of
negative data is therefore a crucial component of a lean strategic plan for
the genetic analysis of cancer.
Journal
of Negative Results in Biomedicine
This open access, online journal publishes
papers on all aspects of unexpected, controversial, provocative and/or negative
results/conclusions in the context of current tenets, providing scientists
and physicians with responsible and balanced information to support informed
experimental and clinical decisions.
Journal
of Articles in Support of the Null Hypothesis
In the past other journals and reviewers
have exhibited a bias against articles that did not reject the null hypothesis.
We plan to change that by offering an outlet for experiments that do not reach
the traditional significance levels (p<.05). Thus, reducing the file drawer
problem, and reducing the bias in psychological literature. Without such
a resource researchers could be wasting their time examining empirical questions
that have already been examined.
Index of Null Effects and Replication
Failures
The iNERF is an index comprised of short
1-2 page descriptions of experiments or replications that did not meet the
traditional level of significance. The purpose of the iNERF is to provide
researchers with an opportunity to disseminate information about their null
studies without having to take up precious time writing up a full manuscript.
Teaching Problem
Solving, Hypothesis Testing, Evolution, and the Meaning of Life Through the
Marine Insects Question
The question of why insects are not
as dominant at sea as they are on land is ideal for teaching how to form
and evaluate scientific questions. Once a hypothesis is formed, we look for
current examples that would contradict it. For example, the argument that
insects can’t survive in the ocean because of water pressure doesn’t seem
so good when you realize one insect species survives at a depth of 1,300 meters!
Eliminating hypotheses by counter examples is a powerful approach in assessing
hypotheses.
The Seven Open Problems of the
Clay Mathematics Institute
In order to celebrate mathematics in
the new millennium, The Clay Mathematics Institute of Cambridge, Massachusetts
(CMI) has named seven “Millennium Prize Problems” with $1 million allocated
to each. One hundred years earlier, on August 8, 1900, David Hilbert delivered
his famous lecture about open mathematical problems at the second International
Congress of Mathematicians in Paris. This influenced CMI's decision to announce
the millennium problems. The clear and precise way the problems are exposed
is inspiring.
Machine Learning
Research: Four Current Directions
Tom Dietterich - 1997.
Machine Learning research has been making
great progress in many directions. This article summarizes four of these directions
and discusses some current open problems. The four directions are (a) improving
classification accuracy by learning ensembles of classifiers, (b) methods
for scaling up supervised learning algorithms, (c) reinforcement learning,
and (d) learning complex stochastic models. 1 Introduction The last five
years have seen an explosion in machine learning research.
Contact information
Isabelle Guyon
Clopinet Enterprises
955, Creston Road,
Berkeley, CA 94708, U.S.A.
Tel/Fax: (510) 524 6211
isabelle@clopinet.com