Just, M. A., & Carpenter, P. A. (1976). Eye fixations and cognitive processes. Cognitive Psychology, 8, 441–480. doi:10.1016/0010-0285(76)90015-3

Summary

Just & Carpenter explore three different tasks that rely on high-level cognitive processing, and recorded the eye movements of participants while performing these tasks. Their main hypothesis is that participants fixate on a region that is involved in whatever computation is currently taking place in the mind. I’m only going to go through the first of their experiments, however, as it is the most related to mental imagery and I feel the other two experiments are somewhat less revealing. Regarding the first mental rotation experiment, they ask three questions:

  1. How does the subject know which parts of the figure are to be rotated into each other?
  2. How does the subject know how far to rotate one of the objects?
  3. Once the required rotation has been performed, how does the subject know whether the two figures represent the same object or not?

Methods

Participants performed essentially the same task as that from Shepard & Metzler, though with far less data (120 trials vs 1600) and also fewer subjects (3 vs 8). Just & Carpenter find similar results, though not identical, results (i.e. monotonically increasing response times).

Algorithm

Just & Carpenter propose the following model of cognitive processing:

  1. First, participants engage in a “search” procedure, in which they look back and forth between the two images to find a corresponding segment of the objects.
  2. Second, participants engage in the “transform and compare” procedure, in which they mentally rotate the images toward each other in increments of $50^\circ$ until the images are within $25^\circ$ of each other.
  3. Third, participants engage in the “confirmation” procedure, in which they check that the other segments in the images also align. There are two proposed ways that they do this: either by executing a transformation on an additional segment of the stimulus, or by checking that the relations between the segments and the central part of the image are the same.

Takeaways

I think it makes sense that people would only actually focus on a single portion of the image at a time, rather than holistically transforming the entire image. I feel though that Just & Carpenter haven’t answered their first question, which is how exactly the “search” procedure works. They do bring up an interesting point regarding this, which is that one possible reason for the increase in response time is that participants pick one segment in one image, and then try to find that segment in the other image, first looking at the same location in the second image—so by design, it will take them longer to find the corresponding segment if it is further away (as it would be due to rotation).

I wonder how this procedure works with other types of objects, that don’t necessarily have such discrete “segments” as the stimuli used in this experiment. I suppose that this question might not actually be that relevant for realistic stimuli, though. If we use something like mental rotation to e.g. identify similar objects in the world, then we probably first rely on high-information, high-level features (color, texture, overall shape). Only if those features are absent (as is the case with the Shepard & Metzler stimuli) would one need to do something more detailed like a mental rotation.

One takeaway from all of this, I feel, is that mental rotation isn’t really all that useful/important in and of itself—I would hypothesize that we are not able to do rotations for the sake of doing rotations, but that we need that sort of capability in order to reason about how objects interact with each other or how they should be manipulated (e.g. how should I rotate this object so that it fits in the drawer?). So under a rational analysis the question isn’t, “how does mental rotation work?”, but rather, “what problem is mental rotation solving?”. It would be nice to be able to come up with a realistic task (or set of tasks), for example, in which mental rotation is clearly the right solution, and then see how people do it, and then compare models that do or do not use mental rotation to solve the task. Maybe some sort of bin-packing problem? I am sure someone has studied that sort of thing in cognitive science; I should remember to look up some of that research.