Kemp, C., Perfors, A., & Tenenbaum, J. B. (2007). Learning overhypotheses with hierarchical Bayesian models. Developmental Science, 10(3), 307–321. doi:10.1111/j.1467-7687.2007.00585.x

Summary

In this paper, Kemp et al. formulate a hierarchical Bayesian model (HBM) for learning overhypotheses and demonstrate how it can account for two kinds of empirical results (the shape bias, and grouping into ontological kinds).

Methods

n/a

Algorithm

The model is parameterized as follows:

where $\mathbf{y}^i$ is the data (e.g. counts of features in category $i$), $n^i$ is the number of observations for category $i$, $\mathbf{\theta}^i$ specifies the distribution of features in category $i$, $\alpha^k$ is the degree of uniformity for features across categories within the ontological kind $z_i$, $\mathbf{\beta}^k$ is the distribution of features across all categories within the ontological kind $z_i$, and $\mathbf{z}$ is the partition (assignment) of categories to ontological kinds.

Given training data $\mathbf{y}$, the model can be fit to reflect the distribution of features within categories and across categories within a particular ontological kind, and also to reflect the assignment of categories to ontological kinds. Given a test exemplar $T$ (with known category) and three choice objects, the model needs to produce the probability that each choice object belongs to the same category as the exemplar. It’s not entirely clear to me what they specifically compute to get this. One possibility would be:

where $c_C$ is the category of the choice object, $C$ are the features of the choice object, and $i$ is the category of the exemplar.

Takeaways

This is a nice use of a HBM to show how theories/overhypotheses might be implemented in a computational-level framework. I do wonder thought how much of the heavy lifting is being done by the fact that they’ve only included features that could be relevant and they’ve discretized the feature domains. In the real world, things are going to be much more ambiguous than that. First, how does the child determine what features are relevant in the first place (why pay attention to overall shape, rather than ear shape)? Second, what if some dimensions (e.g. size) are continuous values rather than discrete values? In the continuous case, modeling the data as feature counts will not longer work.

It would be interesting to see if this type of framework could be applied as a way of classifying objects based on how they should be simulated. For example, this might be applicable to first grouping things ontologically (fluids, rigid bodies, soft bodies) which might determine the overall class of simulation that needs to be run. Then individual categories within each ontological kind might determine the types of parameters that need to be put into the simulation (e.g. mass, coefficient of friction, etc.). I’m not sure this is exactly the right framework for making those sorts of approximations but it might be a good place to start for at least looking at how we make the necessary decisions for setting up a simulation.