Theory-based causal induction
Griffiths, T. L., & Tenenbaum, J. B. (2009). Theory-based causal induction. Psychological Review, 116(4), 661–716. doi:10.1037/a0017201
Summary
In this paper, Griffiths & Tenenbaum outline a probabilistic framework for causal induction that allows inference to be done over variables, graphs, and theories. The framework has three main components:
- Ontology of entities, properties, and relations (i.e., “how entities are differentiated on the basis of their causal properties”, pg. 663)
- Plausible relations between entities (i.e., what types of relationships should be considered in a given domain)
- Functional forms of relations (i.e., the direction in which causes act, and how causes combine)
These three components are implemented in a probabilistic graphical model as:
- Ontology = variables
- Plausible relations = graph (the structure)
- Functional form = probability distribution (the parameterization)
Given an existing ontology, learning the plausible relations is a type of structure learning, and learning the functional form is a type of parameter estimation.
Griffiths & Tenenbaum note, however, that knowledge of ontologies, plausible relations, and functional forms itself is not something that can be expressed through the framework of graphical models:
Knowledge about the ontology, plausibility, and functional form of causal relationships should influence the prior, likelihood, and hypothesis space for Bayesian inference. However, expressing this knowledge requires going beyond the representational capacities of causal graphical models. Although this knowledge can be instantiated in a causal graphical model, it generalizes over a set of such models, and thus cannot be expressed in any one model. (pg. 669)
They argue that this type of knowledge is akin to the notion of an intuitive theory, and draw the relationship between theories and grammars; causal structures and syntactic structures; and data and sentences. To formalize theories in their framework, they make use of hierarchical Bayesian models which allow inference to be performed both at the level of the theory as well as the lower levels of causal structures and relationships.
Methods
n/a
Algorithm
Types of functional forms
Noisy-OR
where $e^+$ is the presence of the effect, $c$ is the presence/absence of the cause $w_0$ is the probability of the effect in the absence of the cause, and $w_1$ is the probability of the effect given a single cause.
Noisy-AND-NOT
Generic
Continuous Noisy-OR
where $w_i$ and $t_i$ are the weight and time associated with the $i$th cause.
Medical contingency data
This case study shows how the framework can account for contingency data.
Ontology
Types:
- $\mathrm{Chemical}$ ($N_C\sim P_C$)
- $\mathrm{Gene}$ ($N_G\sim P_G$)
- $\mathrm{Mouse}$ ($N_M\sim P_M$)
Predicates:
- $\mathrm{Injected}(\mathrm{Chemical}, \mathrm{Mouse}) \rightarrow [0, 1]$
- $\mathrm{Expressed}(\mathrm{Gene}, \mathrm{Mouse}) \rightarrow [0, 1]$
Plausible relations
- $\mathrm{Injected}(C, M) \rightarrow \mathrm{Expressed}(G, M)$, true for all $M$ with probability $p$ for each $C$, $G$ pair
Functional forms
- $\mathrm{Injected}(C, M) \sim \mathrm{Bernoulli}(\cdot{})$
- $\mathrm{Expressed}(G, M) \sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR, noisy-AND-NOT, or generic functional form, where $w_0$ and $w_1$ are drawn from uniform distributions
Blicket detection
This case study shows how the framework can account for small amounts of data.
Ontology
Types:
- $\mathrm{Block}$ ($N_B\sim P_B$)
- $\mathrm{Blicket}$ ($p$)
- $\mathrm{NonBlicket}$ ($1-p$)
- $\mathrm{Detector}$ ($N_D\sim P_D$)
- $\mathrm{Trial}$ ($N_T\sim P_T$)
Predicates:
- $\mathrm{Contact}(\mathrm{Block}, \mathrm{Detector}, \mathrm{Trial}) \rightarrow [0, 1]$
- $\mathrm{Active}(\mathrm{Detector}, \mathrm{Trial}) \rightarrow [0, 1]$
Plausible relations
- $\mathrm{Contact}(B, D, T) \rightarrow \mathrm{Active}(D, T)$ for all $T$ for any $D$ if $B$ is a $\mathrm{Blicket}$
Functional forms
- $\mathrm{Contact}(B, D, T) \sim \mathrm{Bernoulli}(\cdot{})$
- $\mathrm{Active}(D, T) \sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR, where $w_0=\epsilon$ and $w_1=1-\epsilon$
In the deterministic detector theory, $\epsilon=0$. In the probabilistic dector theory, $\epsilon>0$.
Stick ball machine
This case study shows how the framework can account for hidden causes.
Ontology
Types:
- $\mathrm{Ball}$ ($N_B\sim P_B$)
- $\mathrm{HiddenCause}$ ($N_H=\inf$)
- $\mathrm{Trial}$ ($N_T\sim P_T$)
Predicates:
- $\mathrm{Moves}(\mathrm{Ball}, \mathrm{Trial}) \rightarrow [0, 1]$
- $\mathrm{Active}(\mathrm{HiddenCause}, \mathrm{Trial}) \rightarrow [0, 1]$
Plausible relations
- $\mathrm{Moves}(B_1, T) \rightarrow \mathrm{Moves}(B_2, T)$, true for all $T$ with probability $p$ for each $B_1\neq B_2$ pair
- $\mathrm{Active}(H, T) \rightarrow \mathrm{Moves}(B, T)$, each $B$ has an edge from some $H$ with probability $q$. The particular $H$ is chosn according to a Chinese Restaurant Process (i.e. based on number of edges)
Functional forms
- $\mathrm{Active}(H, T) \sim \mathrm{Bernoulli}(\cdot{})$
- $\mathrm{Moves}(B_1, T) \sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR, where $w_0=0$ and $w_i=\omega$ for the $i^{th}$ cause (either $B_2$ or some $H$)
Lemur colonies
This case study shows how the framework can account for hidden causes in a spatial domain.
Ontology
Types:
- $\mathrm{Colony}$ ($N_C\sim P_C$)
- $\mathrm{HiddenCause}$ ($N_H\in [0, 1]$)
Predicates:
- $\mathrm{Location}(\mathrm{Colony}) \rightarrow \mathcal{R}\subset\mathbb{R}^2$
- $\mathrm{Nexus}(\mathrm{HiddenCause}) \rightarrow \mathcal{R}\subset\mathbb{R}^2$
Plausible relations
- $\mathrm{Nexus}(H) \rightarrow \mathrm{Location}(C)$, true with probability $p$ for each $H$, $C$ pair
Functional forms
- $\mathrm{Nexus}(H) \sim \mathrm{Uniform}(\mathcal{R})$
- $\mathrm{Location}(C) \sim \mathcal{N}(\mathrm{Nexus}(H),\Sigma)$ if $\mathrm{Nexus}(H) \rightarrow \mathrm{Location}(C)$, otherwise $\mathcal{N}((0, 0), \inf)$
Exploding cans
This case study shows how the framework can account for hidden causes in a spatiotemporal domain.
Ontology
Types:
- $\mathrm{Can}$ ($N_C\sim P_C$)
- $\mathrm{HiddenCause}$ ($N_H=\inf$)
Predicates:
- $\mathrm{ExplosionTime}(\mathrm{Can}) \rightarrow \mathbb{R}+$ (time)
- $\mathrm{ActivationTime}(\mathrm{HiddenCause}) \rightarrow \mathbb{R}+$
- $\mathrm{Location}(\mathrm{Can}) \rightarrow \mathbb{R}$
Plausible relations
- $\mathrm{ExplosionTime}(C_1) \rightarrow \mathrm{ExplosionTime}(C_2)$, true with probability 1 for each $C_1\neq C_2$ pair
- $\mathrm{ActivationTime}(H)\rightarrow \mathrm{ExplosionTime}(C)$, each $C$ has an edge from some $H$ with probability 1. The particular $H$ is chosn according to a Chinese Restaurant Process (i.e. based on number of edges)
Functional forms
- $\mathrm{ActivationTime}(H)\sim \mathrm{Exponential}(\alpha)$
- $\mathrm{ExplosionTime}(C_1)\sim \mathrm{Exponential}(\lambda(t))$ for $\lambda(t)$ from a continuous noisy-OR with $w_i=\omega$ for either times $t_i=\mathrm{ActivationTime}(H)$ or $t_i=\mathrm{ExplosionTime}(C_2)+\vert\mathrm{Location}(C_2)-\mathrm{Location}(C_1)\vert/\mu$
Cross-domain causal induction
Ontology
Types:
- $\mathrm{Cause}$ ($N_C\sim P_C$)
- $\mathrm{InDomain}$
- $\mathrm{OutDomain}$
- $\mathrm{Effect}$ ($N_E\sim P_E$)
- $\mathrm{Trial}$ ($N_T\sim P_T$)
Predicates:
- $\mathrm{Present}(\mathrm{Cause}, \mathrm{Trial})\rightarrow [0, 1]$
- $\mathrm{Active}(\mathrm{Effect}, \mathrm{Trial})\rightarrow [0, 1]$
Plausible relations
- $\mathrm{Present}(C, T)\rightarrow \mathrm{Active}(E, T)$, true for all $T$ with probability $p$ for each $C$, $E$ pair when $C$ is an $\mathrm{InDomain}$ cause, and with probability $q$ for each $C$, $E$ pair where $C$ is an $\mathrm{OutDomain}$ cause.
Functional forms
- $\mathrm{Present}(C, T)\sim \mathrm{Bernoulli}(\cdot{})$
- $\mathrm{Active}(E, T)\sim \mathrm{Bernoulli}(\nu)$ for $\nu$ from a noisy-OR with $w_0=\epsilon$ and $w_i=1-\epsilon$
Takeaways
The framework outlined by Griffiths & Tenenbaum is an extremely rich and flexible framework for causal reasoning across a wide range of domains. It would be really interesting to try to apply this to even more complex physical domains—for example, can this framework be extended to explain some of the results from Michotte experiments, such as the launching effect? It’s not immediately clear to me how you would do this, but my guess is that it would be somewhat similar to the exploding can example, which includes both spatial and temporal causes.