Discovery of Causation Direction by Machine Learning
Techniques
Mario de-Prado-Cumplido,
Antonio Artes-Rodriguez -
Universidad Carlos III de Madrid, Spain
Obtaining a causal relation between variables is a challenging task, specially with observational
data. We use some well known machine learning techniques to tackle causal discovery.
We introduce in this paper two novel algorithms that are able to direct the edges between
variables, starting from a known skeleton of the causal graph. The first algorithm is a
search-and-score procedure that maximizes the conditional likelihood of the graph using
the BDeu score. The directions of the edges are encoded in a binary string. Then, a genetic
algorithm is used to search for the optimum over the space of all possible arrangements of
the directed edges. The other machine learning technique uses classifiers to identify the
"V" structures present in the graph. This constraint-based procedure takes into account
chains of three adjacent nodes, X - Y - T. Two kNN classifiers are trained, one with Y , the
other with both {X, Y}, in order to classify the node in the edge T. If Markov Condition
is satisfied, and both X and Y are relevant in the classifier, then the correct direction of
the edges must be X -> Y <- T. A bootstrap naive Empirical Distribution Function is
obtained to construct an hypothesis test The algorithms are proved with a synthetic data
base.