White, P. A. (2012). The experience of force: The role of haptic experience of forces in visual perception of object motion and interactions, mental simulation, and motion-related judgments. Psychological Bulletin, 138(4), 589–615. doi:10.1037/a0025587

Summary

In this paper, White proposes a theory of action and perception that is based on the notion of force. Specifically, he argues that during our interactions with the world, we perceive force from our haptic system (along with other sensory modalities), and these perceptions get stored in memory along with the relevant actions associated with them. Then, when we perceive new situations, we activate these stored representations which allows us to make predictions and judgments about motion and other factors.

First, White discusses evidence for forward models of action in the motor system, as well as evidence for the role of mechanoreceptor feedback. What is sounds like he proposes is a sort of forward model like this:

$[\mathbf{x}_{t+1}, \mathbf{s}_{t+1}] = f(\mathbf{x}_t,\mathbf{s}_t,\mathbf{u}_t)$

where $\mathbf{x}$ is the state of the system, $\mathbf{s}$ is the sensory information (e.g. from the haptic system), and $\mathbf{u}$ are the controls (forces) of the system. A prediction error for the sensory information (e.g. mechanoreceptor feedback) is also computed:

$\mathbf{\delta}_t=\mathbf{s}_t - \hat{\mathbf{s}}_t$

where $\mathbf{s}_t$ is the predicted sensory information and $\hat{\mathbf{s}}_t$ is the true sensory information. The feedback $\mathbf{\delta}_t$ is thus the error signal, which is going to be zero when our predictions of force are accurate. White also argues that perception of additional object properties (texture, rigidity, mass, etc.) are computed based on sensory information from mechanoreceptors. I’ll denote these properties as $\mathbf{\pi}$.

All of these different sources of information are stored in long-term memory, roughly (it seems) in the form of tuples such as:

$\mathbf{m}_t=[\mathbf{x}_t,\mathbf{\delta}_t,\mathbf{\pi}]$

White describes these as:

A stored representation of an action on an object is a multimodal episodic trace combining haptic information such as the disposition and movement of the limbs during execution of the action, visual information about body movement and the associated motion of the object acted on, auditory information such as sounds elicited by contact between extremity and object, and in principle, information in any sensory modality. Internally available information such as the content of the forward model also forms part of the representation. (pg. 607)

Importantly, we store our prediction error of sensory information, rather than the absolute sensory information itself. These stored representations are activated by matching to similar perceptual stimuli (e.g. visual stimuli).

White uses this formulation of stored representations to offer a unifying account for several lines of research:

The storing of sensory feedback, rather than direct sensory information, predicts that when making judgments about force in terms of one moving object acting on a stationary, we assign notions of force from the moving object (because we do store $\mathbf{u}_t$) but not to the moving object (because, for static objects, the sensory prediction error should be zero). If both objects are moving, however, we should assign a notion of force that the second object is applying to the first object because the sensory prediction error is nonzero. This explains, for example, Michottean launching effects.
To the extent that visual perception of motion matches stored representations corresponding to actions, we should perceive that motion as being internally caused. This extends to biological plausibility as well. Importantly, biologically generated motion has different velocity profiles than, for example, two nonbiological objects colliding—thus visual motion that matches the biological motion velocity profile should be interpreted as more biological.
Representational momentum
Perception of inanimate entities as intentional
Mental simulation
Perception of mass

Takeaways

This is a surprisingly consistent and satisfying account of how perception arises from the combination of visual and haptic feedback. Assuming people do store information as something like $\mathbf{m}_t$ defined above, and they have access to forward an inverse models, it should be possible to reconstruct $\mathbf{u}_t$ (from both $\mathbf{x}_t$ and $\mathbf{x}_{t+1}$), which is consistent with White’s assertions. I am skeptical, though, that all we are doing is “storing” and “matching” representations. It is not at all clear to me how it would work to match the motion of a 2D ball (e.g. in the Michotte experiments) to the stored motion of ourselves. Additionally, it sounds like White is advocating for something like an exemplar model, but I find it much more likely that we use our experiences to build structured forward or inverse models. There may be multiple forward models (as suggested by Kawato) that are perhaps combined in certain ways, but give that there is evidence for some generalization (also described by Kawato), I would be very surprised if all that was going on was just storing and matching exemplars.