The Causality Workbench Team - Isabelle Guyon, Constantin Aliferis,
Greg Cooper, André Elisseeff, Jean-Philippe Pellet, Peter Spirtes
We organized for NIPS 2008 a challenge in causality. The challenge was
organized around a number of proposed tasks, but the participants were encouraged
to bring their own problems (hence the name "pot-luck"). The initial five
tasks were:
- CYTO:
Causal Protein-Signaling Networks in human T cells. Learn a protein signaling
network from multicolor flow cytometry data. N=11 proteins, P~800 samples
per experimental condition. E=9 conditions.
- LOCANET:
LOcal CAusal NETwork. Find the local causal structure around a given target
variable (depth 3 network) in REGED, CINA, SIDO, MARTI.
- PROMO:
Simulated marketing task. Time series of 1000 promotion variables and 100
product sales. Predict a 1000x100 boolean influence matrix, indicating for
each (i,j) element whether the ith promotion has a causal influence of the
sales of the jth product. Data is provided as time series, with a daily value
for each variable for three years.
- SIGNET:
Abscisic Acid Signaling Network. Determine the set of 43 boolean rules that
describe the interactions of the nodes within a plant signaling network. 300
separate Boolean pseudodynamic simulations of the true rules. Model inspired
by a true biological system.
- TIED:
Target Information Equivalent Dataset. Illustrates a case in which there are
many equivalent Markov boundaries. Find them all.
Dozens of researchers tried some of the tasks and/or worked on preparing
new tasks. Two new contributed datasets were added in the course of the challenge.
- CauseEffetPairs:
Find the causal direction in eight pairs of variables. This task is built
from real time series of weather data (e.g., temperature and altitude for
a pair).
- STEMMATOLOGY:
Reconstruct a family tree of documents derived from one another.
In spite of the fact that it was introduced only mid-way through the challenge,
the CauseEffetPairs
task received a lot of attention and the winners, Kun Zhang and Aapo Hyvärinen
(University of Helsinki, Finland) correctly discovered the cause-effect
direction in all 8 pairs of variables. This result was found statistically
significant in a sign rank test (with risk <1%). Five other datasets were
contributed, but were not ready soon enough to be made part of the challenge.
One of them donated by Guido
Nolte (Fraunhofer FIRST, Berlin, Germany) the NOISE
dataset of EEG signals, won the best dataset award. The new donated datasets
are available from our repository and
we intend to organize new events around them.
Significant progresses were made on several other tasks of the challenge.
Noteworthy are five contributions, which received special mentions:
- Two groups made significant advances on the SIGNET
dataset by proposing new methods to learn dynamic causal Boolean networks.
Cheng Zheng and Zhi Geng
(Pekin University, China) propose a new method to reduce the explanatory
set of variables. Their method obtains the best prediction accuracy. Working
on the same dataset, Mehreen
Saeed (University of Lahore, Pakistan) proposes an elegant approach
using Bernoulli mixture models for identifying corners of a hypercube and
extracting Boolean rules from data.
- Two groups made significant advances on the LOCANET
task, which consists in learning local causal networks from observational
data. Ernest Mwebaze and John
Quinn (Makerere University, Kampala, Uganda) obtain the best results
so far on the REGED
dataset, using committee-based structure learning. The group of You Zhou, Changzhang Wang, Jianxin
Yin, and Zhi Geng (Pekin University, China) extended their method
PCD-by-PCD, which won a prize in the fist causality challenge we organized,
and obtained the best results so far on SIDO.
Also worthy of attention are the results of participants, which were not eligible
for a prize because they did not submit a paper: a group of students of Deniz
Yuret (Koc University, Turkey), obtained the best results so far on the
CINA
dataset and Catharina Olsen (Universite Libre de Bruxelles, Belgium) obtained
the best results on MARTI.
However, there is no method that works well at this stage accross all datasets.
- The Advanced
Analytics group of Eugene Tuv (Intel, LTD) tackled the
TIED
dataset to uncover all the possible Markov boundaries of a target variable
using decision trees. The correctly uncovered 3 groups of equivalent variables,
but omitted two groups.