Causal inference - conditional independences and beyond
# Causal inference - conditional independences and beyond

#### Abstract

Machine learning has traditionally been focused on prediction. Given observations that have been generated by an unknown stochastic dependency, the goal is to infer a law that will be able to correctly predict future observations generated by the same dependency. In contrast to this goal,
causal inference tries to infer the causal structure underlying the observed dependencies. More precisely,
one tries to infer the behavior of a system under interventions without performing them, which does not fit into any traditional prediction scenario.
Apart from the fact that it is still debated whether this is possible at all, it is
a priori not clear, given that it is, why machine learning tools should be helpful for this task.

Since the Eighties there has been a community of researchers, mostly from statistics, philosophy, and computer science who have developed methods aiming at inferring causal relationships from observational data. The
pioneering work of Glymour, Scheines, Spirtes, and Pearl describes assumptions that link conditional statistical dependences to causality, which then
renders many causal inference problems solvable. The typical task, which is solved by the corresponding algorithms,
reads: given observations from the joint distribution on the variables X_{1},...,X_{n} with n ≥ 3, infer the causal directed acyclic graph (or parts of it).

Recently, this work has been complemented by several researchers from machine learning who described methods that do not rely on conditional independences alone, but employ
other properties of joint probability distributions. These methods use established tools of machine learning like, for instance, regression and reproducing kernel Hilbert spaces.
In contrast to the above approaches, the causal direction can sometimes be inferred when only two variables are observed.
Remarkably, this can be helpful for more traditional machine learning tasks like prediction under changing background conditions, because
the task has different solutions depending on whether the predicted variable is the cause or the effect.
#### Outline

- Introductory remarks: causal dependences versus statistical dependences

- Independence based causal inference: assumptions/algorithms/limitations

- New inference principles via restricting model classes

- Foundation of new inference rules by algorithmic information theory

- How machine learning can benefit from causal inference

JanzingSchoelkopf.pdf