Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. The Behavioral And Brain Sciences, 36(3), 181–204. doi:10.1017/S0140525X12000477

Summary

In this BBS article, Clark lays out a grand unified theory of cognition based on the idea that our brains are predictive machines. At the high levels of processing, our brains construct hypotheses/predictions of our percepts, and at the low levels of processing, our sensory systems compute error signals between what is predicted and what is actually sensed. He begins with the following assumption about what the brain is trying to accomplish:

How, simply on the basis of patterns and changes in its own internal states, is [the brain] to alter and adapt its resources so as to tune itself to act as a useful node… for the origination of adaptive responses? Notice how different this conception is to ones in which the problem is posed as one of establishing a mapping relation between environmental and inner states. The task is not to find such a mapping but to infer the nature of the signal source (the world) from just the varying input signal itself. (pg. 183)

Clark next defines his overarching theory as that of “action-oriented predictive processing”, and quotes from Hawkins & Blakeslee (2004):

As strange as it sounds, when your own behavior is involved, your predictions not only precede sensation, they determine sensation. Thinking of going to the next pattern in a sequence causes a cascading prediction of what you should experience next. As the cascading predition unfolds, it generates the motor commands necessary to fulfil the prediction. Thinking, predicting, and doing are all part of the same unfolding of sequences moving down the cortical hierarchy. (pg. 186)

In other words, action is just a way to bring the world into alignment with the brain’s predictions.

Clark goes on to describe a large body of work that supports these views, particularly from the realm of neuroscience and perceptual science. He also explains how the action-oriented predictive processing scheme unifies the levels of analysis, in that it is a neural implementation of what are actually generative Bayesian models.

Takeaways

I agree with a lot of what Clark says in this article, though there are a few points that I disagree with.

First, “predictive” is not the same as “generative” (though all generative models are predictive, in a sense). Discriminative models are also predictive, in the sense that you receive an input $x$ and predict an output $y$. Generative models, by contrast, can jointly predict $x$ and $y$ simultaneously, making it possible to predict $y$ from $x$ but also the other way around (predicting $x$ from $y$). Thus, generative models have more predictive power in the sense that they can be used to predict more relationships between the data, but that doesn’t mean discriminative models aren’t predictive. Clark uses “predictive” and “generative” interchangeably, but I think that is a mistake.

I definitely agree that having a generative model is incredibly powerful because it allows us to make a priori predictions in the absence of data. However, generative models can be more difficult to learn, and while they may be overall more general, that may come with the cost of a loss of precision for specific prediction tasks. Thus, discriminative models are important, too, and it is almost certainly incorrect to say that the brain never makes use of simple mappings.

I’m not sure I agree with Clark’s characterization of action. I do like the idea that our minds make predictions, and then we use action to make those predictions come true, but I think it is important to make a distinction between predictions of the generative process in the world (i.e., what is going to happen next in the absence of action) versus predictions of how the default process might change if we were to act on it.

I am also not sure if I agree with the idea that the brain is trying to reduce error, especially since “error” is not a well-defined term. Clark seems to use it in the sense of reducing entropy, but there’s a lot of other ways it could be used. By definition, any type of learning system is trying to optimize some objective function—the “error”—so while technically true, this isn’t really that new of a concept, so I don’t feel that it provides all that much explanatory power.

The more important component of Clark’s argument is that of the large part generative models probably play. Yet, Clark doesn’t go into details about how those models are actually constructed, beyond referring to a few things like the Helmholtz machine. That is where I think the truly difficult problems lie—determining what the generative models are, how we construct them in the first place, and when a generative model is the thing that’s constructed as opposed to a discriminative model. There are particularly difficult questions to be answered in domains where the models must be able to handle incredibly high-dimensional data and parse it into complex, structured models. Saying the brain constructs generative models is a useful starting point (and it’s a starting point that I agree with), but it really doesn’t say anything about what the actual structure of those models is.