*** Pot-luck challenge: Deadline extended to November 19, 2008 ***

The goal of this workshop is to discuss new approaches to causal discovery from empirical data, their applications and methods to evaluate their success. Emphasis will be put on the definition of objectives to be attained and on assessment methods to evaluate proposed solutions. The participants are encouraged to participate to a "competition pot-luck" in which datasets and problems are exchanged and solutions proposed.


Motivation


Machine learning has traditionally been focused on prediction. Given observations that have been generated by an unknown stochastic dependency, the goal is to infer a law that will be able to correctly predict future observations generated by the same dependency. Statistics, in contrast, has traditionally focused on "data modeling'', i.e., on the estimation of a probability law that has generated the data.

During recent years, the boundaries between the two disciplines have become blurred and both communities have adopted methods from the other, however, it is probably fair to say that neither of them has yet fully embraced  the field of causal modeling, i.e., the detection of causal structure underlying the data. This has probably different reasons. Many statisticians would still shun away from developing and discussing formal methods for inferring causal structure, other than through experimentation, as they would traditionally think of such questions as being outside statistical science and internal to any science where statistics is applied.  Researchers in machine learning, on the other hand, have too long focused on a limited set of problems, shying away from non i.i.d. data and problems of distribution shifts between training and test set, neglecting the mechanisms  underlying the generation of the data, including issues like stochastic dependence,  and all too often neglecting statistical tools like hypothesis testing, which are crucial to current methods for causal discovery.

Since the Eighties there has been a community of researchers, mostly from statistics and philosophy, who in spite of the pertaining views described above have developed methods aiming at inferring causal relationships from observational data, building on the pioneering work of Glymour, Scheines, Spirtes, and Pearl. While this community has remained relatively small, it has recently been complemented by a number of researchers from machine learning. This introduces a new viewpoint to the issues at hand, as well as a new set of tools, including algorithms of causal feature selection, nonlinear methods for testing statistical dependencies using reproducing kernel Hilbert spaces, and methods derived from independent component analysis.

Presently, there is a profusion of algorithms being proposed, mostly evaluated on toy problems. One of the main challenges in causal learning consists in developing strategies for an objective evaluation. This includes, for instance, methods how to acquire large representative data sets with known ground truth.  This, in turn, raises the question to what extent the regularities observed in these data sets also apply to the relevant data sets where the causal structure is unknown because data sets with known ground truth may not be representative.


Second causality challenge: competition pot-luck
We are organizing a new causal discovery challenge whose purpose is to exchange datasets and benchmark causal discovery algorithms.

The participants can either


The deadline for submitting results or datasets is November 19, 2008. The participants must also submit a paper by November 21, 2008 to compete towards a prize.
Prizes: Four free NIPS 2008 workshop registrations (one for the best contributed task and the three for the best contributed solutions), or a cash prize of 200 USD.
The prizes will be determined on the basis of the papers and the challenge results, by a vote of the co-chairs, co-organizers and advisors.

See http://clopinet.com/causality for details.


Participation
Participation in the workshop is not conditioned to entering the challenge. Likewise, challenge entrants are not required to attend the workshop(s) nor to publish the methods they employed. 


To present a poster at the workshop, please submit a 200-word abstract.

To present a paper
or to be published in the proceedings, please submit a 6-page paper.

Submissions should be emailed to causality @ clopinet . com before November 21, 2008.


The proceedings of the workshop will be compiled as an issue of the JMLR workshop proceedings. An sample paper and a latex style file are provided.


Specific areas relevant to the workshop(s) include, but are not limited to:

a. Methods to discover causal structure from data and to perform causal inference (e.g., estimate causal effects, predict effects of actions, produce most probable causal explanations, perform inference with counter-factuals, etc.). Methods based on the use of multiple types of data (e.g., observational, experimental, case control) and methods based on combining knowledge (e.g., in the form of constraints or prior beliefs) and data, are encouraged. Such methods may be based on Bayesian Networks and other Probabilistic Graphical Models, Markov Decision Processes, Structural Equation Models, Propensity Scoring, Information Theory, Granger Causality, or other appropriate frameworks.

 

b. Theory:

- Operational definitions of causality suitable for practical causal discovery.

- Formal criteria (e.g., statistical tests of significance of causal relationships, model scoring measures.) for causal model selection.

- Properties (e.g., soundness/consistency, stability, sample efficiency, computational efficiency) of existing and novel causal discovery methods.

- Statistical complexity and feasibility of learning causal relationships under different assumptions.

- Formal connections relevant to causal discovery among diverse fields such as Artificial Intelligence, Decision Theory, Econometrics, Markov Decision Processes, Control Theory, Operations Research, Planning, Experimental Design theory, etc.

 

c. Characterization of causal interpretability of non-causal machine learning and statistical methods, especially feature selection methods using theoretical and empirical approaches:

- Characterizing major existing and novel causally and non-causally-motivated feature selection methods in terms of causal validity.

- Studying the concept of relevancy and its relationship with causality.

- Causal feature selection methods with improved computational performance and accuracy suitable for large dimensional problems and/or small sample sizes.

 

d. Assumptions for causal discovery. Theoretical and empirical study of:

- Study of violations of typical assumptions for causal discovery (e.g., Causal Faithfulness Condition, Causal Markov Condition, Causal Sufficiency, causal graph sparseness, linearity, specific parametric forms of data distributions, etc.).

- Prevalence and severity of violations of assumptions and study of worst-case and average-case effects of such violations.

- Novel or modified assumptions and their properties.

 

e. Evaluation methods, including the study of appropriate performance measures, research designs, benchmarks etc. to empirically study the performance and pros and cons of causal discovery methods.

 

f. Real-world applications and benchmarking of causal discovery algorithms, including rigorous studies of highly innovative software environments for causal discovery.

Schedule

September 15, 2008: challenge start.

October 15, 2008: deadline for (optional) submission of milestone challenge results.
October 20, 2008: public anonymous release of milestone result analysis.
October 24, 2008: workshop abstracts due.

November 19, 2008: challenge ends (last day to submit tasks or challenge results).

November 20, 2008: challenge results released to participants.
November 21, 2008: JMLR proceedings paper submission deadline. Abstracts for poster presentation still accepted until that date.
December 1, 2008: paper notification of acceptance.

December 12, 2008: challenge results publicly released; workshop.


Preliminary workshop schedule:


Morning. Bernhard Schölkopf, chair


7.30 -  8.00    Welcome, program presentation, and poster highlights (Dominik Janzing)
8.00 -  9.00    Tutorial  / overview
9.00 -  9.15    Competition results (Isabelle Guyon)
9.15 -  9.30    Data exchange, and benchmarks (Patrik Hoyer)

9:30 -  10.30   Poster viewing, coffee, informal discussions

Afternoon. Dominik Janzing, chair

4.00 - 4.30    Keynote talk, historical and philosophical perspective
4.30 - 4.45    Contributed talk     
4.45 - 5.00    Contributed talk     
5.00 - 5.15    Contributed talk  
5.15 - 5.30    Contributed talk    
5.30 - 5.45    Invited talk, positive perspective

5.45 - 6.00    Invited talk, skeptical perspective

6.00 - 7.00    Plenary discussion (Isabelle Guyon moderator)


Invited speakers: Phil Dawid (University of Cambridge), Kevin Murphy (University of British Columbia), Judea Pearl (UCLA), Thomas Richardson (University of Washington), Donald Rubin (Harvard University), and Richard Scheines, (Carnegie Mellon University).

Links to related workshops/competitions


WCCI  2008 causation and prediction challenge. A first activity of the causality workbench team who is bringing you this new challenge.

NIPS 2006 workshop on causality and feature selection. The ancestor of this workshop.

IJCNN 2007 Agnostic learning vs. Prior knowledge challenge. “When everything fails, ask for additional domain knowledge” is the current motto of machine learning. Therefore, assessing the real added value of prior/domain knowledge is a both deep and practical question.The participants competed in two track: the “prior knowledge track” for which they had access to the raw data and information about the data, and the “agnostic learning track” for which they had access to preprocessed data with no knowledge of the identity of the features.

WCCI 2006 performance prediction challenge. “How good are you at predicting how good you are? 145 participants tried to answer that question. Cross-validation came very strong. Can you do better? Measure yourself against the winners by participating to the model selection game.

NIPS 2003 workshop on feature extraction and feature selection challenge. We organized a competition on five data sets in which hundreds of entries were made. The web site of the challenge is still available for post challenge submissions. Measure yourself against the winners! See the book we published with a CD containing the datasets, tutorials, papers on s.o.a. methods.

Pascal challenges: The Pascal network is sponsoring several challenges in Machine learning.

Data mining competitions:
A list of data mining competitions maintained by KDnuggets, including the well known KDD cup.

List of data sets for machine learning:
A rather comprehensive list maintained by MLnet.

UCI machine learning repository: A great collection of datasets for machine learning research.

DELVE: A platform developed at University of Torontoto benchmark machine learning algorithms.

CAMDA
Critical Assessment of Microarray Data Analysis, an annual conference on gene expression microarray data analysis. This conference includes a context with emphasis on gene selection, a special case of feature selection.

ICDAR
International Conference on Document Analysis and Recognition, a bi-annual conference proposing a contest in printed text recognition. Feature extraction/selection is a key component to win such a contest.

TREC
Text Retrieval conference, organized every year by NIST. The conference is organized around the result of a competition. Past winners have had to address feature extraction/selection effectively.

ICPR
In conjunction with the International Conference on Pattern Recognition, ICPR 2004, a face recognition contest is being organized.

CASP
An important competition in protein structure prediction called Critical Assessment of
 Techniques for Protein Structure Prediction.

Contact information

causality @ clopinet . com

Co-chairs: 
Isabelle Guyon (Clopinet), Dominik Janzing and Bernhard Schoelkopf (Max Plank Institute for Biological Cybernetics).

Co-organizers and advisors:
Constantin F. Aliferis (Vanderbilt University), Gregory F. Cooper (University of Pittsburgh), André Elisseeff (IBM Research),
Patrik Hoyer (University of Helsinki), Klaus-Robert Müller (Fraunhofer-Institut),  Jean-Philippe Pellet (IBM/ETH, Zurich), Peter Spirtes, (Carnegie Mellon University), Alexander Statnikov (Vanderbilt University).

Sponsors

Pascal   Microsoft   ETH     MPS     Clopinet

NSF

This project is supported by the National Science Foundation under Grants N0. ECCS-0725746. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.