Kosslyn, S. M. (1988). Aspect of a Cognitive Neuroscience of Mental Imagery. Science, 240(4859), 1621–1626. Retrieved from http://www.jstor.org/stable/1701012

Summary

Kosslyn presents neuroscientific evidence for how mental images are generated in the brain. He argues there are two separate systems:

  1. The first system encodes object shape, which may be either a low-resolution image of the entire object or higher-resolution images of object parts
  2. The second system encodes object location, which may be either represented categorically (e.g. using symbolic/linguistic terms, such as “next to” or “inside of”) or via absolute coordinates.

These two systems interact so that when a mental image is generated, “the patterns in the images are built up by activating parts individually and that parts are imaged in roughly the order in which they are typically drawn” (pg. 1622).

In particular, Kosslyn hypothesizes (and provides evidence for) the left cerebral hemisphere being better at arranging object parts according to categorical relations, while the right cerebral hemisphere is better at arranging object parts according to absolute locations. Both hemispheres are equally good at generating whole objects (that are not broken down into individual parts).

Methods

Kosslyn reports results both from split-brain patients as well as undergraduates. Both types of experiments rely on a technique called lateralization, in which the stimulus is presented in only the left or right half of the visual field for a brief period of time (150ms; short enough such that participants could not move their eyes to move the stimulus into the other part of the visual field). In the case of the split-brain patients, there is no communication between the halves of the brain, so only one half of the brain will receive the visual information. In the case of the undergraduates with intact corpus callosi, Kosslyn measured reaction time (i.e. something presented to the right side of the brain that requires the left side of the brain for processing should elicit slower reaction times than if the same thing were directly presented to the left side of the brain).

For the split-brain patient J.W., one task was to observe a lowercase probe and then determine whether the uppercase version of the letter had any curved lines. For the left hemisphere, J.W. had perfect performance; for the right hemisphere, performance was significantly reduced. This result suggests that the left hemisphere is involved in generating images that are composed of parts (e.g. strokes). To further supplement this finding, they ran another experiment that involved judging dimensions of holistic object (e.g., if an object like a belt buckle was wider than it was tall). In this case, both hemispheres performed equally well.

For the undergraduates, the task was to view a lowercase letter in a grid. There were probe points in some of the grid cells, and participants had to determine wither the uppercase version of the letter would cover the probe points. As predicted, responses were faster when the probe was presented to the left hemisphere rather than the right hempisphere. If instead of a grid the letter was presented in just a box, participants were forced to remember the absolute position of the letter parts (rather than their symbolic relations) and in this case, the left hemisphere performed worse tha the right hemisphere.

Algorithm

n/a

Takeaways

This is a collection of really elegant results showing how mental images are stored and produced in the brain. It would be interesting to come up with a model that efficiently trades off between the different forms of representation to solve these tasks. Such a model should store representations at different levels of granularity (Joe Austerweil has some work related to this, which I want to remember to reread now) and should additionally store representations both using symbolic relations as well as absolute positions. This is actually pretty similar to Brenden Lake’s work on motor program induction—I’m not sure about the representation of where the parts go (I will have to go back and read the details more carefully) but I think it is symbolic/categorical. So it would be interesting to try to augment that sort of model with the capability for also absolute positioning, and different granularities of the parts.

I also wonder how these types of representations interact with manipulation of the images. For example, if I say “imagine a F lying on its side”, do people imagine an upright F, and the rotate it holistically? Or do they imagine one part of the F, rotate it, and then attach the other parts to it? Or do they imagine all the parts rotated independently, and then attach them to each other? Subjectively, I feel like I first generate the image, and then rotate it, but I’m not sure subjective reports are necessarily reliable. Based on the work from Just & Carpenter, the answer seems more likely to be that one part is rotated, and then the others are “attached”.