Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity, and Bayesian inference. The Behavioral And Brain Sciences, 24, 629–640; discussion 652–791. doi:10.1017/S0140525X01000061

Summary

Different notions of similarity have been talked about before, in particular:

  • Shepard’s universal law of generalization
  • Tversky’s contrast model

Tenenbaum & Griffiths describe a particular Bayesian model which unifies these two models. In particular, they rely on the notion of the size principle, which says that smaller, more specific hypotheses should receive higher probabilities than larger, more generic hypotheses.

They also discuss several phenomena that a law of variability should account for (and for which their model does):

  • Example variability: if the examples have low variance, then there should be a lower chance of generalization outside the range of examples
  • Number of examples: as the number of examples increases, there should be a lower chance of generalization outside the range of examples

Methods

n/a

Algorithm

The assumption is made that there is some unknown consequential region $C$ and a hypothesis space $\mathcal{H}$ (of which there is some $h\in \mathcal{H}$ such that $h=C$). Then, given examples $x\in C$, the probability of $y\in C$ is:

where $p(x\vert h)$ can be determined either by weak sampling or strong sampling:

  • Weak sampling: $p(x\vert h)=1$ if $x\in h$ and 0 otherwise
  • Strong sampling: $p(x\vert h)=\frac{1}{\vert h\vert }$ if $x\in h$ and 0 otherwise

And where the prior can be arbitrary.

Takeaways

Using this type of structured Bayesian model is powerful, because it allows us to make explicit our assumptions. From that, the phenomena we want to capture fall out naturally, rather than needing to be explicitly included (such as the weight of feature sets in Tversky’s contrast model).

I have the quesetion of: why do consequential regions have to discrete sets or ranges? If you got an example of a wolf, that is kind of a dog, but not exactly – so should it be included as an example of a not-dog, or of a dog? I don’t think this framework allows for that possibility. Actually, as I am writing this, though, I am realizing that it does. Really, the likelihood $p(x\vert h)$ can be arbitrary and doesn’t need to adhere to strong or wheak sampling… e.g. your hypotheses could be the parameters for a Gaussian distribution, in which case you can calculate e.g.