Bergen, B. K., Lindsay, S., Matlock, T., & Narayanan, S. (2007). Spatial and linguistic aspects of visual imagery in sentence comprehension. Cognitive Science, 31(5), 733–64. doi:10.1080/03640210701530748

Summary

In this paper, Bergen et al. examine some of the ways that language triggers the use of visual mental imagery. In particular, they look at the Perky effect, which they describe as follows:

In a seminal study, Perky (1910) asked participants to imagine seeing an object (such as a banana or leaf) while they were looking at a blank screen. At the same time, unbeknownst to them, an actual image of the same object was projected on the screen, starting below the threshold for conscious perception, but with progressively greater and greater illumination. Perky found that many participants continued to believe that they were still just imagining the stimulus and failed to recognize that there was actually a real, projected image even at levels where the projected image was perfectly perceptible to participants not simultaneously performing imagery. (pg. 736)

The Perky effect can be explained based on competition for the same set of resources: visual perception uses some of the same resources as mental imagery, and so if mental imagery is engaged, it can interfere with our ability to perceive things. This paradigm can be extended to measure how mental imagery is being used in language comprehension, by having people listen to a sentence just before performing a visual perception task.

Using the above paradigm, Bergen et al. find that nouns and verbs that are associated with concrete motion or spatial locations (e.g. as in The ceiling shook or The ball dropped) do invoke mental imagery in the relevant location of the visual field. In contraast, metaphorical motion does not produce this effect (e.g. The market fell), nor do abstract verbs that are associated with up/down but which do not literally mean up/down (e.g. The quantity dwindled). Their conclusion is that concrete sentences evoke visual mental imagery, but metaphorical and abstract sentences—even if they do involve some form of motion of spatial location—do not engage imagery (or at least not in the same way).

Methods

In all experiments, the task was to listen to a sentence, after which a circle or square would appear in one portion of the screen (up, down, left, right). Participants had to judge whether the object was a circle or square. The hypothesis was that if mental imagery was being used in a particular portion of the visual field when the sentence was read, then it would take participants longer to judge the object if that object were in the same part of the visual field than if it was not.

Experiment 1 looked at up/down verbs (e.g. dropped, rose). They did find that sentences with up verbs resulted in longer processing when the object was in the upper part of the screen than when it was in the lower part of the screen (5/5), and that sentences with down verbs resulted in longer proceessing when the object was in the lower part of the screen than when it was in the upper part of the screen (3/5).

Experiment 2 looked at up/down nouns (e.g. ceiling, cellar). They found essentially the same results as before: sentences with up nouns resulted in longer processing times when the shape was in the upper part of the screen (3/5), and sentences with down nouns resulted in longer processing times when the shape was in the lower part of the screen (4/5).

Experiment 3 looked at metaphorical motion (e.g. The market fell), but did not find the effect either for metaphorical up verbs (2/5) or for metaphorical down verbs (2/5).

Experiment 4 looked at abstract verbs with up/down connotations (e.g. dwindled, increased) and again did not find the effect either for abstract up verbs (3/5) or for abstract down verbs (2/5).

Experiment 5 looked at axis (horizontal versus vertical) effects, rather than directional (up versus down) effects. They also did not find an effect here.

Algorithm

n/a

Takeaways

First—wow, the Perky effect is awesome! I have heard variants of that before (e.g. it is hard to read and listen to language simultaneously) but I hadn’t heard of people actually conflating mental imagery and true perceptions.

This paradigm used by Bergen et al. is really neat, though the fact that they didn’t find the expected effect for all of the individual sentences (even if they found it overall) is somewhat less compelling. It would be interesting to use eyetracking with these experiments to see if people are actually attending to the portion of the field that is hypothesized, particularly in the case of the abstract/metaphorical sentences—not finding saccades in those cases would strengthen their hypothesis that people are not actually constructing a concrete visual scene in the same way.

On a different note, given that these types of sentences do seem to evoke mental imagery, it would be interesting to try to reverse some of the image captioning algorithms that are coming out of the machine learning literature in order to produce a type of simulated mental imagery, which perhaps could then augmented as more information comes in—so the first sentence might retrieve an image, which is then parsed into components, and as more sentences are heard, they modify the current scene to make it consistent with the linguistic description. Then, that scene could be used to do further reasoning about what is being said—e.g. verifying whether a description about it is true or not.