Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116(4), 661–716. doi:10.1037/a0017201

Summary

In this paper, Griffiths & Tenenbaum outline a probabilistic framework for causal induction that allows inference to be done over variables, graphs, and theories. The framework has three main components:

Ontology of entities, properties, and relations (i.e., “how entities are differentiated on the basis of their causal properties”, pg. 663)
Plausible relations between entities (i.e., what types of relationships should be considered in a given domain)
Functional forms of relations (i.e., the direction in which causes act, and how causes combine)

These three components are implemented in a probabilistic graphical model as:

Ontology = variables
Plausible relations = graph (the structure)
Functional form = probability distribution (the parameterization)

Given an existing ontology, learning the plausible relations is a type of structure learning, and learning the functional form is a type of parameter estimation.

Griffiths & Tenenbaum note, however, that knowledge of ontologies, plausible relations, and functional forms itself is not something that can be expressed through the framework of graphical models:

Knowledge about the ontology, plausibility, and functional form of causal relationships should influence the prior, likelihood, and hypothesis space for Bayesian inference. However, expressing this knowledge requires going beyond the representational capacities of causal graphical models. Although this knowledge can be instantiated in a causal graphical model, it generalizes over a set of such models, and thus cannot be expressed in any one model. (pg. 669)

They argue that this type of knowledge is akin to the notion of an intuitive theory, and draw the relationship between theories and grammars; causal structures and syntactic structures; and data and sentences. To formalize theories in their framework, they make use of hierarchical Bayesian models which allow inference to be performed both at the level of the theory as well as the lower levels of causal structures and relationships.

Methods

n/a

Algorithm

Types of functional forms

Noisy-OR

$p(e^+\vert c; w_0, w_1)=1-(1-w_0)(1-w_1)^c$

where $e^+$ is the presence of the effect, $c$ is the presence/absence of the cause $w_0$ is the probability of the effect in the absence of the cause, and $w_1$ is the probability of the effect given a single cause.

Noisy-AND-NOT

$p(e^+\vert c; w_0, w_1)=w_0(1-w_1)^c$

Generic

$\begin{align*} p(e^+\vert c^-)&=w_0\\ p(e^+\vert c^+)&=w_1 \end{align*}$

Continuous Noisy-OR

$\lambda(t)=\sum_i w_i\delta(t, t_i)$

where $w_i$ and $t_i$ are the weight and time associated with the $i$th cause.

Medical contingency data

This case study shows how the framework can account for contingency data.

Ontology

Types:

$\mathrm{Chemical}$ ($N_C\sim P_C$)
$\mathrm{Gene}$ ($N_G\sim P_G$)
$\mathrm{Mouse}$ ($N_M\sim P_M$)

Predicates:

$\mathrm{Injected}(\mathrm{Chemical}, \mathrm{Mouse}) \rightarrow [0, 1]$
$\mathrm{Expressed}(\mathrm{Gene}, \mathrm{Mouse}) \rightarrow [0, 1]$

Plausible relations

$\mathrm{Injected}(C, M) \rightarrow \mathrm{Expressed}(G, M)$, true for all $M$ with probability $p$ for each $C$, $G$ pair

Functional forms

$\mathrm{Injected}(C, M) \sim \mathrm{Bernoulli}(\cdot{})$
$\mathrm{Expressed}(G, M) \sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR, noisy-AND-NOT, or generic functional form, where $w_0$ and $w_1$ are drawn from uniform distributions

Blicket detection

This case study shows how the framework can account for small amounts of data.

Ontology

Types:

$\mathrm{Block}$ ($N_B\sim P_B$)
- $\mathrm{Blicket}$ ($p$)
- $\mathrm{NonBlicket}$ ($1-p$)
$\mathrm{Detector}$ ($N_D\sim P_D$)
$\mathrm{Trial}$ ($N_T\sim P_T$)

Predicates:

$\mathrm{Contact}(\mathrm{Block}, \mathrm{Detector}, \mathrm{Trial}) \rightarrow [0, 1]$
$\mathrm{Active}(\mathrm{Detector}, \mathrm{Trial}) \rightarrow [0, 1]$

Plausible relations

$\mathrm{Contact}(B, D, T) \rightarrow \mathrm{Active}(D, T)$ for all $T$ for any $D$ if $B$ is a $\mathrm{Blicket}$

Functional forms

$\mathrm{Contact}(B, D, T) \sim \mathrm{Bernoulli}(\cdot{})$
$\mathrm{Active}(D, T) \sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR, where $w_0=\epsilon$ and $w_1=1-\epsilon$

In the deterministic detector theory, $\epsilon=0$. In the probabilistic dector theory, $\epsilon>0$.

Stick ball machine

This case study shows how the framework can account for hidden causes.

Ontology

Types:

$\mathrm{Ball}$ ($N_B\sim P_B$)
$\mathrm{HiddenCause}$ ($N_H=\inf$)
$\mathrm{Trial}$ ($N_T\sim P_T$)

Predicates:

$\mathrm{Moves}(\mathrm{Ball}, \mathrm{Trial}) \rightarrow [0, 1]$
$\mathrm{Active}(\mathrm{HiddenCause}, \mathrm{Trial}) \rightarrow [0, 1]$

Plausible relations

$\mathrm{Moves}(B_1, T) \rightarrow \mathrm{Moves}(B_2, T)$, true for all $T$ with probability $p$ for each $B_1\neq B_2$ pair
$\mathrm{Active}(H, T) \rightarrow \mathrm{Moves}(B, T)$, each $B$ has an edge from some $H$ with probability $q$. The particular $H$ is chosn according to a Chinese Restaurant Process (i.e. based on number of edges)

Functional forms

$\mathrm{Active}(H, T) \sim \mathrm{Bernoulli}(\cdot{})$
$\mathrm{Moves}(B_1, T) \sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR, where $w_0=0$ and $w_i=\omega$ for the $i^{th}$ cause (either $B_2$ or some $H$)

Lemur colonies

This case study shows how the framework can account for hidden causes in a spatial domain.

Ontology

Types:

$\mathrm{Colony}$ ($N_C\sim P_C$)
$\mathrm{HiddenCause}$ ($N_H\in [0, 1]$)

Predicates:

$\mathrm{Location}(\mathrm{Colony}) \rightarrow \mathcal{R}\subset\mathbb{R}^2$
$\mathrm{Nexus}(\mathrm{HiddenCause}) \rightarrow \mathcal{R}\subset\mathbb{R}^2$

Plausible relations

$\mathrm{Nexus}(H) \rightarrow \mathrm{Location}(C)$, true with probability $p$ for each $H$, $C$ pair

Functional forms

$\mathrm{Nexus}(H) \sim \mathrm{Uniform}(\mathcal{R})$
$\mathrm{Location}(C) \sim \mathcal{N}(\mathrm{Nexus}(H),\Sigma)$ if $\mathrm{Nexus}(H) \rightarrow \mathrm{Location}(C)$, otherwise $\mathcal{N}((0, 0), \inf)$

Exploding cans

This case study shows how the framework can account for hidden causes in a spatiotemporal domain.

Ontology

Types:

$\mathrm{Can}$ ($N_C\sim P_C$)
$\mathrm{HiddenCause}$ ($N_H=\inf$)

Predicates:

$\mathrm{ExplosionTime}(\mathrm{Can}) \rightarrow \mathbb{R}+$ (time)
$\mathrm{ActivationTime}(\mathrm{HiddenCause}) \rightarrow \mathbb{R}+$
$\mathrm{Location}(\mathrm{Can}) \rightarrow \mathbb{R}$

Plausible relations

$\mathrm{ExplosionTime}(C_1) \rightarrow \mathrm{ExplosionTime}(C_2)$, true with probability 1 for each $C_1\neq C_2$ pair
$\mathrm{ActivationTime}(H)\rightarrow \mathrm{ExplosionTime}(C)$, each $C$ has an edge from some $H$ with probability 1. The particular $H$ is chosn according to a Chinese Restaurant Process (i.e. based on number of edges)

Functional forms

$\mathrm{ActivationTime}(H)\sim \mathrm{Exponential}(\alpha)$
$\mathrm{ExplosionTime}(C_1)\sim \mathrm{Exponential}(\lambda(t))$ for $\lambda(t)$ from a continuous noisy-OR with $w_i=\omega$ for either times $t_i=\mathrm{ActivationTime}(H)$ or $t_i=\mathrm{ExplosionTime}(C_2)+\vert\mathrm{Location}(C_2)-\mathrm{Location}(C_1)\vert/\mu$

Cross-domain causal induction

Ontology

Types:

$\mathrm{Cause}$ ($N_C\sim P_C$)
- $\mathrm{InDomain}$
- $\mathrm{OutDomain}$
$\mathrm{Effect}$ ($N_E\sim P_E$)
$\mathrm{Trial}$ ($N_T\sim P_T$)

Predicates:

$\mathrm{Present}(\mathrm{Cause}, \mathrm{Trial})\rightarrow [0, 1]$
$\mathrm{Active}(\mathrm{Effect}, \mathrm{Trial})\rightarrow [0, 1]$

Plausible relations

$\mathrm{Present}(C, T)\rightarrow \mathrm{Active}(E, T)$, true for all $T$ with probability $p$ for each $C$, $E$ pair when $C$ is an $\mathrm{InDomain}$ cause, and with probability $q$ for each $C$, $E$ pair where $C$ is an $\mathrm{OutDomain}$ cause.

Functional forms

$\mathrm{Present}(C, T)\sim \mathrm{Bernoulli}(\cdot{})$
$\mathrm{Active}(E, T)\sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR with $w_0=\epsilon$ and $w_i=1-\epsilon$

Takeaways

The framework outlined by Griffiths & Tenenbaum is an extremely rich and flexible framework for causal reasoning across a wide range of domains. It would be really interesting to try to apply this to even more complex physical domains—for example, can this framework be extended to explain some of the results from Michotte experiments, such as the launching effect? It’s not immediately clear to me how you would do this, but my guess is that it would be somewhat similar to the exploding can example, which includes both spatial and temporal causes.