(missing reference)

Summary

In this paper, Schwartz asks: is physical imagery based on kinematics or dynamics? Specifically, does it only rely on spatial information (kinematic model, KM), or does it also incorporate information about things like forces (dynamic model, DM)? Schwartz shows through a series of four experiments that physical imagery is consistent with the dynamics account.

Methods

In Experiment 1, participants completed two tasks in one of two orders. One task was the “judge” task, which was to view two differently size glasses of water (with the same level of water) and to determine which glass needed to tilt farther for the water to spill out. The other task was the “tilt” task, in which participants physically rotated empty glasses with their eyes closed and imagined the water in the glasses. The hypothesis was that if people were using KM, if people performed the “judge” task before the “tilt” task, then their judgments should affect the outcome of the tilt (as the tilt would just be based on a spatial outcome). If people were using DM, then doing the “judge” task before the “tilt” task should interfere with people’s mental imagery (which based on previous work is accurate for tilting but inaccurate for judging). They found the results that were consistent with DM: when tilting first, people (correctly) tilted the thin glass further, but when judging first, they either tilted the wide glass further or tilted the two glasses the same. People were almost entirely incorrect in the explicit judgments, and judgments did not correspond to tilts.

In Experiment 2, participant performed the “tilt” task but were told to either imagine that the liquid in the glass was water or molasses. The hypothesis was that if people are using KM, then the type of liquid should have no effect; if they are using DM, then they should tilt the molasses glasses further, because the molasses moves more slowly than water. The results were again consistent with the DM account. People tilted the thin glasses further than wide glasses (as expected) and also turned the molasses glasses further than the water glasses. Schwartz makes an important point in the discussion of this experiment:

People rely on the temporal coordinations of physical imagery to allow inferences to emerge; they do not first decide what the inference should be and then adjust the timing of things to portray that inference. (pg. 449)

In Experiment 3, Schwartz had people tilt glasses normally, or tilt them starting from a horizontal position. He also had people perform the tasks lying down. In addition to the tilting task, Schwartz had people rate the quality of their mental imagery, in order to gauge how well people were able to imagine the water in the glass. The idea was to see whether gravity would have an effect on people’s ability to perform the task. He found that, as before, people were able to perform the task when both they and the glass were upright. Interestingly, people could also perform the task if they were lying down, provided the glass was still upright with respect to gravity. They could not perform the task if the glass was horizontal, saying things like “the water began to pour out when I started to tilt the glass” (pg. 452). In terms of the quality of their imagery when the glass was sideways, it was typically high initially (before they started tilting), but image quality degraded as they began the glass tilt. These results also support the DM account.

Experiment 4 looked at another manipulation to test how perceptual information affects imagery. In this experiment, Schwartz had regular glasses and weighted glasses, and hypothesized that people would turn the weighted cups less than the regular cups because the extra weight from the glass introduces a torque that increases as the amount of rotation increases. In particular, as the water level decreases, people should increasingly under rotate because they have to rotate further into the torque. As predicted, this is what Schwartz found. He also ran a control version in which he had people rotate the glasses to a specified rotation (without water, 45 degrees). People were able to perform this task nearly perfectly, suggesting that the results from the main experiment were not due to people’s inability to represent the angle of the glass. Rather, Schwartz suggests that the effect is due to a relationship between the rate of work exerted in turning the glass and rate of change of the water. As the class is turned, the rate of work increases, causing the water to change more quickly, and thus causing people to underrotate.

Algorithm

n/a

Takeaways

These experiments indicate that dynamic information is clearly incorporated in our ability to visualize objects and our actions on those objects. Schwartz makes the argument that, contrary to other mental imagery accounts, we do not represent transformations by computing the target spatial orientation, but that we represent the control and apply that control until the orientation is achieved. I like this account, but I still don’t quite know how it fits into the mental rotation account. It cannot be that people just randomly pick a direction of rotation, as then their response times would average out to be constant.

Perhaps, as argued by Just and Carpenter, people do rely on some sort of feature matching in order to determine the direction of rotation—but not the final orientation. Then, they apply the relevant control in order to move the shape in that direction until they match the correct orientation. It’s still not clear to me exactly how you would tell if you’ve reached the correct orientation… I suppose if people only rotate one part of the shape, then that local piece would be easy to compare. Then, once the local rotation is found, people presumably know what the angle is and can rotate the rest of the shape to that angle, and don’t necessarily need to provide the control.

The control account is very compelling, but I wonder if there are really some cases where we use purely visual imagery, and other cases where we use dynamic imagery. For example, in the mental rotation case I just described, could one component of that be using dynamic imagery, while another component just uses spatial imagery? Would it be possible to test for this? Do these two cases differ in important ways (i.e., does it really matter if we only use one or the other)? I would expect that it does. I have read a lot of stuff in robotics that is based on knowing the goal state and applying control to get there, but I have read less about simply applying control until some conditions are satisfied (i.e. it is not explicitly a goal state in terms of pose). I need to think more about how these two things are different (or if they are different at all). Perhaps the latter isn’t really actually that different—it’s just that some higher level planner is making the goal states be not very far away from the current state.